Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis

Fengjie Li,Jiajun Jiang,Jiajun Sun,Hongyu Zhang
2024-06-04
Abstract:Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in repairing real-world bugs. However, existing APR methods often utilize patches generated by LLMs without further optimization, resulting in reduced effectiveness due to the lack of program-specific knowledge. Furthermore, the evaluations of these APR methods have typically been conducted under the assumption of perfect fault localization, which may not accurately reflect their real-world effectiveness. To address these limitations, this paper introduces an innovative APR approach called GIANTREPAIR. Our approach leverages the insight that LLM-generated patches, although not necessarily correct, offer valuable guidance for the patch generation process. Based on this insight, GIANTREPAIR first constructs patch skeletons from LLM-generated patches to confine the patch space, and then generates high-quality patches tailored to specific programs through context-aware patch generation by instantiating the skeletons. To evaluate the performance of our approach, we conduct two large-scale experiments. The results demonstrate that GIANTREPAIR not only effectively repairs more bugs (an average of 27.78% on Defects4J v1.2 and 23.40% on Defects4J v2.0) than using LLM-generated patches directly, but also outperforms state-of-the-art APR methods by repairing at least 42 and 7 more bugs under perfect and automated fault localization scenarios, respectively.
Software Engineering
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two aspects: 1. **Limitations of existing automatic program repair (APR) methods based on large - language models (LLM)**: - Current methods usually directly utilize patches generated by LLM without further optimization or refinement. This results in the generated patches may not correctly include elements of a specific program, such as local variables and domain - specific function calls, thus causing the patches to fail the test cases. - Existing evaluation methods assume that the fault location has been perfectly located, which is not realistic in practical applications because automated fault - location techniques are often inaccurate. Therefore, it is necessary to evaluate the performance of these methods in a more realistic automated fault - location scenario. 2. **How to effectively utilize "incorrect" patches generated by LLM to improve the overall repair ability**: - Although patches generated by LLM may not be completely correct, they can provide valuable guidance for the patch - generation process. How to effectively utilize these "incorrect" patches to improve the overall repair ability is a problem that has not been fully explored yet. To address these problems, the paper proposes a new automatic program repair method - G IANT REPAIR. The core idea of this method is to use patches generated by LLM as initial guidance, generate patch skeletons by abstracting these patches, and then combine context - aware patch - generation techniques to generate high - quality patches. The specific steps include: 1. **Patch skeleton construction**: - Extract specific code modifications from patches generated by LLM. - Abstract these modifications into patch skeletons to limit the patch space. 2. **Patch instantiation**: - Use static analysis techniques to replace the abstract tokens in the patch skeletons with specific program elements to generate executable patches. 3. **Patch ranking and verification**: - Evaluate the effectiveness of the generated patches through test cases and rank them according to the quality of the patches, giving priority to evaluating the patches that are most likely to be correct. Through these steps, G IANT REPAIR not only improves the effectiveness of patches generated by LLM but also shows superior performance in a more realistic automated fault - location scenario.