Enhancing IR-based Fault Localization using Large Language Models

Shuai Shao,Tingting Yu
2024-12-05
Abstract:Information Retrieval-based Fault Localization (IRFL) techniques aim to identify source files containing the root causes of reported failures. While existing techniques excel in ranking source files, challenges persist in bug report analysis and query construction, leading to potential information loss. Leveraging large language models like GPT-4, this paper enhances IRFL by categorizing bug reports based on programming entities, stack traces, and natural language text. Tailored query strategies, the initial step in our approach (LLmiRQ), are applied to each category. To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+. Additionally, to further enhance query utilization, we implement a learning-to-rank model that leverages key features such as class name match score and call graph score. This approach significantly improves the relevance and accuracy of queries. Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques, showcasing superior performance.
Software Engineering
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the challenges faced by information - retrieval - based fault localization (IR - based Fault Localization, IRFL) techniques when dealing with error report analysis and query construction. Specifically, although existing IRFL techniques perform well in ranking source files, they still have the following deficiencies: 1. **Error report analysis**: Traditional methods often fail to fully understand the semantic content when dealing with error reports, resulting in information loss or noise introduction. 2. **Query construction**: Existing query construction strategies may lack accuracy or completeness when generating queries, especially when dealing with programming entities, stack traces, and natural language texts. 3. **Query accuracy**: Due to inaccuracies and information loss in query construction, fault localization is not precise enough. To solve these problems, this paper proposes a new method - LLmiRQ (Large Language Model for Information Retrieval Query), which uses large - language models (such as GPT - 4) to enhance IRFL. The main improvements include: - **Classifying error reports**: Classify error reports into different categories according to programming entities, stack traces, and natural language texts, and apply customized query strategies for each category. - **Query optimization**: Optimize the construction process of the initial query by iteratively designing prompt words to ensure that the query is more precise and effective. - **Interactive query rewriting**: Introduce a user - feedback mechanism, and continuously adjust and optimize the query through conversations with users to improve the accuracy and relevance of the query. - **Learning - to - rank model**: Use the Learning - to - Rank (LtR) model, combined with key features (such as class name matching scores and call graph scores), to further improve the relevance and accuracy of query results. Through these improvements, LLmiRQ and its enhanced version LLmiRQ+ significantly improve the accuracy and efficiency of fault localization and outperform existing IRFL techniques on multiple evaluation metrics.