Article

Google DeepMind’s CodeMender automates vulnerability fixes, patches 72 open-source flaws in six months

DATE: 10/6/2025 · STATUS: LIVE

CodeMender quietly fixes dangerous bugs across open-source, pushing faster patches and raising surprising questions about who controls software security next…

Google DeepMind’s CodeMender automates vulnerability fixes, patches 72 open-source flaws in six months

Article content

Google DeepMind has put into operation an autonomous AI agent that seeks out and repairs serious security flaws in software code. Called CodeMender, the system has submitted 72 security fixes to established open-source projects over the past six months.

Finding and patching vulnerabilities remains a difficult and time-consuming task, even when teams use automated tools like fuzzing. Google DeepMind’s own research efforts, including AI projects such as Big Sleep and OSS-Fuzz, have been successful at revealing new zero-day flaws in code that had already been audited. That success has produced a fresh bottleneck: as AI speeds the identification of bugs, human developers face more pressure to create fixes.

CodeMender was built to address that imbalance. The agent operates autonomously and takes a broad approach to code security. It can react to newly discovered issues by producing patches quickly, and it can proactively rewrite portions of code to remove whole categories of vulnerabilities before an attacker can exploit them. That frees maintainers and developers to spend more time on features and overall software quality.

Under the hood, CodeMender relies on the advanced reasoning of Google’s recent Gemini Deep Think models. That foundation gives the agent the capacity to debug and resolve complex security problems with a high level of independence. The system is fitted with a toolset that lets it analyze and reason about code before making edits. A separate validation pipeline checks that any modification actually fixes the root cause, remains functionally correct, does not break existing tests, and follows the project’s coding style. Only patches that pass these checks are presented for human review.

Mistakes in security patches can carry big consequences, so the agent’s automatic validation framework is a central element of its design. The framework runs the proposed change through a chain of verification steps: it confirms the underlying bug has been addressed, verifies behavior against test suites, and checks for regressions or style violations. That selection process reduces the chance that low-quality patches reach maintainers.

DeepMind developed new techniques to improve the agent’s ability to produce safe, correct fixes. CodeMender uses advanced program analysis and a toolbox that includes static and dynamic analysis, differential testing, fuzzing, and SMT solvers. Those tools let the agent inspect control flow, data flow, and common code patterns to identify root causes of security weaknesses and architectural problems.

The agent runs inside a multi-agent architecture made up of specialized components that handle different parts of a repair workflow. For example, a critique agent powered by a large language model highlights differences between the original and modified code. That critique helps the main agent verify changes do not introduce unintended side effects and gives it the chance to revise its approach when feedback suggests a correction is needed.

Concrete repairs by CodeMender show how the system works in practice. In one case, a crash report pointed to a heap buffer overflow. The final patch required changing only a few lines, but finding the true cause took deeper inspection. Using a debugger and code search tools, the agent traced the error to incorrect stack management of Extensible Markup Language (XML) elements during parsing, in a different part of the codebase. In a separate example, the agent produced a non-trivial fix for a complex object lifetime bug by altering a custom system that generates C code inside the target project.

Beyond reacting to reported bugs, the agent can harden code to reduce the chance of future exploits. DeepMind deployed CodeMender to apply -fbounds-safety annotations to sections of libwebp, a widely used image compression library. Those annotations instruct the compiler to insert bounds checks, limiting the ways an attacker could exploit a buffer overflow to run arbitrary code.

That work is relevant because a heap buffer overflow in libwebp, tracked as CVE-2023-4863, was previously used by an attacker in a zero-click iOS exploit. DeepMind says that, with the -fbounds-safety annotations applied, that particular vulnerability and most similar overflows in the annotated parts would have been rendered unexploitable.

The agent’s proactive fixes follow a decision-making routine that handles the fallout of its own edits. When an annotation or code change creates new compilation errors or test failures, CodeMender can automatically try corrective edits. If the validation pipeline finds that a change breaks functionality, the agent revises its approach and attempts alternative fixes until the tests pass.

Google DeepMind is taking a cautious path to wider deployment, keeping reliability and human oversight front and center. Right now, every patch produced by CodeMender is inspected by human researchers before it is submitted to an open-source repository. The team is increasing the rate of submissions gradually to keep quality high and to collect structured feedback from the open-source community.

Looking ahead, researchers plan to contact maintainers of critical open-source projects with CodeMender-generated patches and iterate based on their responses. The team intends to publish technical papers and reports in the coming months that describe methods and results. The work marks initial steps in exploring how AI agents can proactively fix code and improve software security for a broad set of projects.

Keep building

Join Skool — Ship Your First Microapp Back to feed