The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models

Fernando Vallecillos Ruiz; Max Hort; Leon Moonen

doi:10.1145/3756681.3756966

What is it about?

Software bugs are inevitable in development, often causing system failures and consuming vast amounts of time to fix. Automated Program Repair (APR) aims to use Artificial Intelligence to identify and repair these errors automatically. However, many existing AI approaches rely on a "brute force" strategy, generating thousands of potential fixes for a single bug, which is computationally expensive and overwhelming for developers to review. This study investigates a more balanced, developer-friendly approach. Instead of generating thousands of solutions, we restrict the AI to a maximum of 10 attempts per bug. We utilize "instruction-tuned" Large Language Models (like Llama 3 and DeepSeek) and test whether it is better to generate several guesses at once or to use an iterative process where the AI learns from error messages to refine its code. We also explore how much training data is actually needed to make these models effective at fixing bugs.

Photo by Deb Dowd on Unsplash

Why is it important?

This work is unique because it prioritizes practical usability and resource efficiency over raw numbers. By limiting the AI to a strict budget of 10 patches, we simulate real-world constraints faced by developers who cannot sift through endless suggestions. Our findings challenge the prevailing belief that "more data is always better"; we demonstrate that fine-tuning models on a very small dataset (less than 1% of the available data) can yield performance improvements of up to 78%, whereas larger datasets can actually lead to overfitting and worse results. Furthermore, we identify a crucial trade-off: while specialized (fine-tuned) models are great at fixing simple bugs quickly, general (base) models are better at listening to feedback and fixing complex problems over time.

Perspectives

Writing this paper highlighted how the principle of "less is more" applies to Large Language Models in software engineering. We were surprised to find that feeding the models massive amounts of training data often hurt their ability to think flexibly, causing them to memorize patterns rather than reasoning through errors. It was equally intriguing to observe the distinct "personalities" of the models: base models acted like students who improved significantly when told why their code failed, while fine-tuned models behaved like rigid experts—they either solved the problem immediately or struggled to adapt. We hope this work encourages the community to move away from simply scaling up data and towards designing smarter, iterative interactions between humans, AI, and compiler feedback.
Fernando Vallecillos Ruiz
Simula Research Laboratory

This page is a summary of: The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3756681.3756966.
You can read the full text:

Read

Resources

Data
Replication Package
Replication package for "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models"

Contributors

The following have contributed to this page

Fernando Vallecillos Ruiz
Simula Research Laboratory

Teaching AI to fix software bugs efficiently using feedback loops

What is it about?

Why is it important?

Perspectives

Resources

Replication Package

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Teaching AI to fix software bugs efficiently using feedback loops

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Replication Package

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management