Abstract:Autonomous program improvement typically involves automatically producing bug fixes and feature additions. Such program improvement can be accomplished by a combination of large language model (LLM) and program analysis capabilities, in the form of an LLM agent. Since program repair or program improvement typically requires a specification of intended behavior - specification inference can be useful for producing high quality program patches. In this work, we examine efficient and low-cost workflows for iterative specification inference within an LLM agent. Given a GitHub issue to be resolved in a software project, our goal is to conduct iterative code search accompanied by specification inference - thereby inferring intent from both the project structure and behavior. The intent thus captured is examined by a reviewer agent with the goal of vetting the patches as well as providing a measure of confidence in the vetted patches. Our approach SpecRover (AutoCodeRover-v2) is built on the open-source LLM agent AutoCodeRover. In an evaluation on the full SWE-Bench consisting of 2294 GitHub issues, it shows more than 50% improvement in efficacy over AutoCodeRover. Compared to the open-source agents available, our work shows modest cost ($0.65 per issue) in resolving an average GitHub issue in SWE-Bench lite. The production of explanation by SpecRover allows for a better "signal" to be given to the developer, on when the suggested patches can be accepted with confidence. SpecRover also seeks to demonstrate the continued importance of specification inference in automated program repair, even as program repair technologies enter the LLM era.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the key problems in automatic program improvement, especially how to efficiently infer and utilize program specifications when using large - language models (LLMs) for program repair and feature addition. Specifically, the paper explores how to guide patch generation through iterative specification inference in the LLMs - guided autonomous software engineering workflow and ensure that the generated patches can accurately reflect the developers' intentions. #### Main problem description 1. **Requirements for automatic program improvement**: - Automatic program improvement usually involves automatically generating error repairs and feature additions. These improvements need to capture the developers' intentions to guide the repair process. - Existing methods such as GitHub Copilot can generate code automatically, but the generated code may have errors and vulnerabilities, so further improvement is required. 2. **Importance of specification inference**: - Specification inference can help capture developers' intentions from program structures and behaviors, thereby generating high - quality program patches. - Specification inference can also serve as evidence to prove why a certain patch is correct, which helps simplify software maintenance and enhance trust in the code. 3. **Limitations of existing methods**: - Existing methods such as symbolic analysis perform specification inference based on test cases and have the problem of over - fitting test data. - For programs without accompanying tests, existing specification inference methods may not be effective. - The symbolic analysis method has a relatively high entry barrier for developers. 4. **The proposed new method - SpecRover**: - SpecRover is an improved tool based on AutoCodeRover, aiming to improve the effect of automatic program improvement through iterative specification inference. - It combines code search, specification inference, and review mechanisms to ensure that the generated patches can not only solve problems but also conform to the developers' intentions. - SpecRover also designs a reviewer agent to verify the correctness of patches and provide confidence measures. #### Core contributions of the paper 1. **Specification inference**: Researched the role of specification inference in LLMs - guided autonomous software engineering and proposed a method to guide patch generation through iterative specification inference. 2. **Patch suggestions with confidence**: Designed a reviewer agent to coordinate specifications, tests, and natural language requirements, provide comprehensive patch verification, and generate evidence of patch correctness. 3. **Experimental verification**: Demonstrated the high efficiency of SpecRover on the SWE - Bench benchmark, solving 19.3% of the complete SWE - Bench problems and 31% of the SWE - Bench Lite problems. At the same time, it maintained a low cost ($0.65 per problem) and supported higher precision/recall rates. ### Summary This paper proposes a new method by introducing SpecRover to solve the specification inference problem in automatic program improvement, especially in the LLMs era, how to efficiently capture and utilize developers' intentions to generate high - quality and trustworthy patches.

SpecRover: Code Intent Extraction via LLMs

AutoCodeRover: Autonomous Program Improvement

A Unified Debugging Approach via LLM-Based Multi-Agent Synergy

MarsCode Agent: AI-native Automated Bug Fixing

Enchanting Program Specification Synthesis by Large Language Models using Static Analysis and Program Verification

How to Understand Whole Software Repository?

Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach

SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications

Agentless: Demystifying LLM-based Software Engineering Agents

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models

Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement

Code Repair with LLMs gives an Exploration-Exploitation Tradeoff

SpecTool: A Benchmark for Characterizing Errors in Tool-Use LLMs

An Empirical Study on LLM-based Agents for Automated Bug Fixing

RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance

Fixing Security Vulnerabilities with AI in OSS-Fuzz

ThinkRepair: Self-Directed Automated Program Repair

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

CodeR: Issue Resolving with Multi-Agent and Task Graphs