Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection

Md Nakhla Rafi,Dong Jae Kim,Tse-Hsun Chen,Shaowei Wang
2024-09-21
Abstract:Locating and fixing software faults is a time-consuming and resource-intensive task in software development. Traditional fault localization methods, such as Spectrum-Based Fault Localization (SBFL), rely on statistical analysis of test coverage data but often suffer from lower accuracy. Learning-based techniques, while more effective, require extensive training data and can be computationally expensive. Recent advancements in Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning. However, these LLM-based techniques still face challenges, including token limitations, degraded performance with long inputs, and difficulties managing large-scale projects with complex systems involving multiple interacting components. To address these issues, we introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy. By dividing large coverage data into manageable groups and employing multiple LLM agents through prompt chaining, LLM4FL navigates the codebase and localizes faults more effectively. The approach also incorporates self-reflection and chain-of-thought reasoning, enabling agents to iteratively generate fixes and re-rank suspicious methods. We evaluated LLM4FL on the Defects4J (V2.0.0) benchmark, comprising 675 real-world faults from 14 open-source Java projects. Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace, all without task-specific training. Additionally, we highlight the impact of coverage splitting and prompt chaining on fault localization performance and show that different method ordering can improve Top-1 accuracy by up to 22%.
Software Engineering
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address several key issues in software fault localization: 1. **Limitations of Traditional Fault Localization Methods**: - Traditional fault localization methods (such as Spectrum-Based Fault Localization, SBFL) rely on statistical analysis of test coverage data but have low accuracy. - Learning methods, although more effective, require a large amount of training data and have high computational costs. 2. **Challenges of Large-Scale Software Projects**: - In large-scale software projects, code coverage and complexity often exceed the token limits of large language models (LLMs). - LLMs experience performance degradation when handling long inputs, making it difficult to manage complex systems involving multiple interacting components. 3. **Shortcomings of Existing LLM Technology**: - Existing LLM technology is limited by token constraints when processing large code files or extensive code coverage data. - LLMs' performance declines when dealing with complex systems, making it challenging to maintain accuracy and consistency. - Current LLM technology has not yet fully explored how to effectively combine with traditional fault localization techniques to maximize their advantages. ### Solution To address the above issues, the authors propose **LLM4FL**, an LLM-based fault localization technique that improves the efficiency and accuracy of fault localization through the following strategies: 1. **Divide-and-Conquer Strategy**: - Use SBFL techniques to rank covered methods and then divide the coverage data into manageable groups, each within the LLM's token limit. - By grouping, the LLM can process each group step by step, avoiding issues caused by token limits. 2. **Multi-Agent Collaboration**: - Introduce two LLM agents: Tester Agent and Debugger Agent. - The Tester Agent is responsible for identifying and prioritizing suspicious methods by analyzing failed tests, stack traces, and related test information. - The Debugger Agent is responsible for in-depth evaluation and ranking of candidate methods by analyzing the source code and related behavior of the methods. 3. **Self-Reflection and Chain-of-Thought Reasoning**: - Through self-reflection and chain-of-thought reasoning, the agents can iteratively generate repair suggestions and reorder suspicious methods, thereby improving the accuracy of fault localization. ### Experimental Results - **Performance Improvement**: - LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy. - LLM4FL also surpasses state-of-the-art supervised techniques such as DeepFL and Grace, even without task-specific training. - **Component Impact Analysis**: - Coverage segmentation and prompt chaining play a crucial role in fault localization accuracy, and removing these components leads to significant performance degradation. - The initial ranking of methods has a significant impact on performance, with different ranking methods improving Top-1 accuracy by up to 22%. ### Summary The paper proposes a new LLM-based fault localization technique, LLM4FL, which effectively addresses fault localization issues in large-scale software projects through a divide-and-conquer strategy and multi-agent collaboration. Experimental results show that LLM4FL significantly outperforms existing fault localization techniques in terms of accuracy.