Abstract:Fault localization (FL) and automated program repair (APR) are two main tasks of automatic software debugging. Compared with traditional methods, deep learning-based approaches have been demonstrated to achieve better performance in FL and APR tasks. However, the existing deep learning-based FL methods ignore the deep semantic features or only consider simple code representations. And for APR tasks, existing template-based APR methods are weak in selecting the correct fix templates for more effective program repair, which are also not able to synthesize patches via the embedded end-to-end code modification knowledge obtained by training models on large-scale bug-fix code pairs. Moreover, in most of FL and APR methods, the model designs and training phases are performed separately, leading to ineffective sharing of updated parameters and extracted knowledge during the training process. This limitation hinders the further improvement in the performance of FL and APR tasks. To solve the above problems, we propose a novel approach called MTL-TRANSFER, which leverages a multi-task learning strategy to extract deep semantic features and transferred knowledge from different perspectives. First, we construct a large-scale open-source bug datasets and implement 11 multi-task learning models for bug detection and patch generation sub-tasks on 11 commonly used bug types, as well as one multi-classifier to learn the relevant semantics for the subsequent fix template selection task. Second, an MLP-based ranking model is leveraged to fuse spectrum-based, mutation-based and semantic-based features to generate a sorted list of suspicious statements. Third, we combine the patches generated by the neural patch generation sub-task from the multi-task learning strategy with the optimized fix template selecting order gained from the multi-classifier mentioned above. Finally, the more accurate FL results, the optimized fix template selecting order, and the expanded patch candidates are combined together to further enhance the overall performance of APR tasks. Our extensive experiments on widely-used benchmark Defects4J show that MTL-TRANSFER outperforms all baselines in FL and APR tasks, proving the effectiveness of our approach. Compared with our previously proposed FL method TRANSFER-FL (which is also the state-of-the-art statement-level FL method), MTL-TRANSFER increases the faults hit by 8/11/12 on Top-1/3/5 metrics (92/159/183 in total). And on APR tasks, the number of successfully repaired bugs of MTL-TRANSFER under the perfect localization setting reaches 75, which is 8 more than our previous APR method TRANSFER-PR. Furthermore, another experiment to simulate the actual repair scenarios shows that MTL-TRANSFER can successfully repair 15 and 9 more bugs (56 in total) compared with TBar and TRANSFER, which demonstrates the effectiveness of the combination of our optimized FL and APR components.

Enhancing IR-based Fault Localization using Large Language Models

Large Language Models for Test-Free Fault Localization

Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Just-In-Time Defect Identification and Localization: A Two-Phase Framework.

An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects

Improving IR-Based Bug Localization with Context-Aware Query Reformulation

Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection

FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models

Impact of Large Language Models of Code on Fault Localization

A Deep Dive into Large Language Models for Automated Bug Localization and Repair

AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

Supporting Cross-language Cross-project Bug Localization Using Pre-trained Language Models

When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done?

Evaluation and Improvement of Fault Detection for Large Language Models

Fault Localization with Code Coverage Representation Learning

A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization

Large Language Models for Information Retrieval: A Survey

Watch out for Version Mismtaching and Data Leakage! A Case Study of Their Influence in Bug Report Based Bug Localization Models

ALBFL: A Novel Neural Ranking Model for Software Fault Localization Via Combining Static and Dynamic Features

MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program Repair