Abstract:Following the rapid development of deep learning, many studies in the field of fault localization (FL) have utilized deep learning to analyze statements' coverage information (i.e., executed or not executed) and test cases' results (i.e., failing or passing), which have shown dramatic ability in identifying suspicious statements potentially responsible for failures. However, they mainly pay attention to the binary information of executing test cases but ignore incorporating code snippets and their inner relationships into the learning process. Furthermore, how a complex deep learning model for FL achieves a particular decision is not transparent. These drawbacks may limit the effectiveness of FL. Recently, graph-based pre-training techniques have dramatically improved the state-of-the-art in a variety of code-related tasks such as natural language code search, clone detection, code translation, code refinement, etc. And interpretable machine learning tackles the problem of non-transparency and enables learning models to explain or present their behaviors to humans in an understandable way.In this paper, our insight is to select a candidate solution that leverages the promising learning ability of graph-based pre-training techniques to learn a feasible model for incorporating code snippets as well as their inner relationships into fault localization, and then uses interpretable machine learning to localize faulty statements. Thus, we propose CodeAwareFL, a code-aware fault localization technique with pre-training and interpretable machine learning. Concretely, CodeAwareFL constructs a variety of code snippets through executing test cases. Next, CodeAwareFL utilizes the code snippets to extract propagation chains which could show a set of variables interact with each other to cause a failure. After that, a graph-based pre-trained model is customized for fault localization. CodeAwareFL takes the code snippets and their corresponding propagation chains as inputs with test results as labels to conduct the training process. Finally, CodeAwareFL evaluates the suspiciousness of statements with interpretable machine learning techniques. In the experimental study, we choose 12 large-sized programs to conduct the comparison. The results show that CodeAwareFL achieves promising results (e.g., 32.43% faults are ranked within top 5), and is significantly better than 12 state-of-the-art baselines.

Fault Localization Via Efficient Probabilistic Modeling of Program Semantics

Just-In-Time Defect Identification and Localization: A Two-Phase Framework.

A General Noise-Reduction Framework for Fault Localization of Java Programs.

Fault Localization from the Semantic Code Search Perspective

VsusFL: Variable-suspiciousness-based Fault Localization for novice programs

A fault localization approach based on fault propagation context

Can Automated Program Repair Refine Fault Localization?

An effective fault localization approach based on PageRank and mutation analysis

A Study of Modified Testing-Based Fault Localization Method

A Fault-Localization Approach Based on the Coincidental Correctness Probability

Can Automated Program Repair Refine Fault Localization? A Unified Debugging Approach

Combining Spectrum-Based Fault Localization and Statistical Debugging - An Empirical Study.

Improving Fault Localization Using Model-domain Synthesized Failing Test Generation

Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection

ALBFL: A Novel Neural Ranking Model for Software Fault Localization Via Combining Static and Dynamic Features

A Hybrid Approach to Fine-grained Automated Fault Localization

Wielding Statistical Fault Localization Statistically

A Combinatorial Testing-Based Approach to Fault Localization

Learning Test-Mutant Relationship for Accurate Fault Localisation

A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization

Code-Aware Fault Localization with Pre-Training and Interpretable Machine Learning