Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Chunying Zhou,Xiaoyuan Xie,Gong Chen,Peng He,Bing Li

2024-09-19

Abstract:Most studies focused on information retrieval-based techniques for fault localization, which built representations for bug reports and source code files and matched their semantic vectors through similarity measurement. However, such approaches often ignore some useful information that might help improve localization performance, such as 1) the interaction relationship between bug reports and source code files; 2) the similarity relationship between bug reports; and 3) the co-citation relationship between source code files. In this paper, we propose a novel approach named Multi-View Adaptive Contrastive Learning for Information Retrieval Fault Localization (MACL-IRFL) to learn the above-mentioned relationships for software fault localization. Specifically, we first generate data augmentations from report-code interaction view, report-report similarity view and code-code co-citation view separately, and adopt graph neural network to aggregate the information of bug reports or source code files from the three views in the embedding process. Moreover, we perform contrastive learning across these views. Our design of contrastive learning task will force the bug report representations to encode information shared by report-report and report-code views,and the source code file representations shared by code-code and report-code views, thereby alleviating the noise from auxiliary information. Finally, to evaluate the performance of our approach, we conduct extensive experiments on five open-source Java projects. The results show that our model can improve over the best baseline up to 28.93%, 25.57% and 20.35% on Accuracy@1, MAP and MRR, respectively.

Software Engineering,Information Retrieval

What problem does this paper attempt to address?

This paper attempts to solve several key problems in software fault localization, especially the challenges faced by fault localization methods based on information retrieval (IR - based) techniques. Specifically, the author points out the following deficiencies in existing methods: 1. **Ignoring useful auxiliary information**: Existing information - retrieval - based methods often ignore some useful information that may help improve localization performance when constructing representations of error reports and source code files and matching their semantic vectors, such as: - The interaction relationship between error reports and source code files. - The similarity relationship between error reports. - The co - reference relationship between source code files. 2. **Impact of text quality**: The performance of information - retrieval - based methods is usually affected by the text quality of error reports. When the text description provided by the error report is insufficient, it is difficult to obtain satisfactory performance even with very complex models. To solve the above problems, the author proposes a new method, called Multi - View Adaptive Contrastive Learning for Information Retrieval Fault Localization (MACL - IRFL). This method aims to improve fault localization in the following ways: - **Constructing a multi - view structure**: Generate data augmentation from three different perspectives (report - code interaction view, report - report similarity view, code - code co - reference view), and use Graph Neural Network (GNN) to aggregate information of error reports or source code files during the embedding process. - **Contrastive learning**: Conduct contrastive learning across these views, design contrastive learning tasks to force error report representations to encode information shared by the report - report and report - code views, and source code file representations to encode information shared by the code - code and report - code views, thereby reducing the impact of noise in auxiliary information. Through this method, MACL - IRFL can use historical repair records and other auxiliary information in the prediction stage to make up for the lack of repair history records in new error reports, and effectively suppress the noise problem caused by auxiliary information overload. Experimental results show that this method significantly outperforms the existing best - performing baseline methods on multiple evaluation metrics.

Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization

Enhancing IR-based Fault Localization using Large Language Models

Fusing Multi-Abstraction Vector Space Models for Concern Localization

Software Fault Localization Based on Multi-objective Feature Fusion and Deep Learning

ALBFL: A Novel Neural Ranking Model for Software Fault Localization Via Combining Static and Dynamic Features

Inferring Links Between Concerns and Methods with Multi-abstraction Vector Space Model.

LMACL: Improving Graph Collaborative Filtering with Learnable Model Augmentation Contrastive Learning

MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program Repair

Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

Contrastive Learning for Multi-Modal Automatic Code Review

Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection

MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation

BugRadar: Bug Localization by Knowledge Graph Link Prediction

Fault Localization from the Semantic Code Search Perspective

Watch out for Version Mismtaching and Data Leakage! A Case Study of Their Influence in Bug Report Based Bug Localization Models

Improving IR-Based Bug Localization with Context-Aware Query Reformulation

D&C: A Divide-and-Conquer Approach to IR-based Bug Localization

AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval

Enhancing medical vision-language contrastive learning via inter-matching relation modelling

Information-guided signal multi-granularity contrastive feature learning for fault diagnosis with few labeled data