Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair
Qiong Feng,Xiaotian Ma,Jiayi Sheng,Ziyuan Feng,Wei Song,Peng Liang
2024-12-05
Abstract:LLMs have garnered considerable attention for their potential to streamline Automated Program Repair (APR). LLM-based approaches can either insert the correct code or directly generate patches when provided with buggy methods. However, most of LLM-based APR methods rely on a single type of software information, without fully leveraging different software artifacts. Despite this, many LLM-based approaches do not explore which specific types of information best assist in APR. Addressing this gap is crucial for advancing LLM-based APR techniques. We propose DEVLoRe to use issue content (description and message) and stack error traces to localize buggy methods, then rely on debug information in buggy methods and issue content and stack error to localize buggy lines and generate plausible patches which can pass all unit tests. The results show that while issue content is particularly effective in assisting LLMs with fault localization and program repair, different types of software artifacts complement each other. By incorporating different artifacts, DEVLoRe successfully locates 49.3% and 47.6% of single and non-single buggy methods and generates 56.0% and 14.5% plausible patches for the Defects4J v2.0 dataset, respectively. This outperforms current state-of-the-art APR methods. The source code and experimental results of this work for replication are available at <a class="link-external link-https" href="https://github.com/XYZboom/DEVLoRe" rel="external noopener nofollow">this https URL</a>.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the existing automatic program repair (APR) methods based on large - language models (LLM) are deficient in leveraging multiple software artifacts. Specifically:
1. **Dependence on a single type of information**: Most of the existing LLM - based APR methods mainly rely on a single type of software information, such as problem descriptions or error stack traces, without fully utilizing the combination of multiple software artifacts. This has led to limited effectiveness of these methods in fault location and program repair.
2. **Unclear utility of different information types**: Although some LLM - based APR methods use different software artifacts, such as problem descriptions and error stack traces, it remains unclear which specific type of information is most effective in helping LLM with fault location and automatic program repair.
To overcome these problems, the authors propose a framework named DEVLoRe (Developer Localization and Repair), which aims to improve the performance of LLM in fault location and program repair by integrating multiple software artifacts. Specifically, the DEVLoRe framework includes the following steps:
1. **Faulty method location**: First, use the problem content (description and discussion) and error stack traces to locate the method containing the error.
2. **Faulty line location**: Then, utilize debugging information, problem content, and error stacks to locate the specific faulty line and generate effective patches.
3. **Patch generation**: Finally, provide all relevant information, including the located faulty lines and methods, the complete faulty method body, problem content, error stack traces, and detailed debugging information, to generate accurate patches.
Through these steps, the DEVLoRe framework can more comprehensively utilize multiple software artifacts, thereby improving the effectiveness of fault location and program repair. Experimental results show that the combined use of problem content, error stack traces, and debugging information can significantly improve the accuracy of fault location and patch generation.