Semantic context based coincidental correct test cases detection for fault localization

Jian Hu
DOI: https://doi.org/10.1007/s10515-024-00466-5
IF: 1.677
2024-08-20
Automated Software Engineering
Abstract:Fault localization is a process that aims to identify the potentially faulty statements responsible for program failures by analyzing runtime information. Therefore, the input code coverage matrix plays a crucial role in FL. However, the effectiveness of fault localization is compromised by the presence of coincidental correct test cases (CCTC) in the coverage matrix. These CCTC execute faulty code but do not result in program failures. To address this issue, many existing methods focus on identifying CCTC through cluster analysis. However, these methods have three problems. Firstly, identifying the optimal cluster count poses a considerable challenge in CCTC detection. Secondly, the effectiveness of CCTC detection is heavily influenced by the initial centroid selection. Thirdly, the presence of abundant fault-irrelevant statements within the raw coverage matrix introduces substantial noise for CCTC detection. To overcome these challenges, we propose SCD4FL: a semantic context-based CCTC detection method to enhance the coverage matrix for fault localization. SCD4FL incorporates and implements two key ideas: (1) SCD4FL uses the intersection of execution slices to construct a semantic context from the raw coverage matrix, effectively reducing noise during CCTC detection. (2) SCD4FL employs an expert-knowledge-based K-nearest neighbors (KNN) algorithm to detect the CCTC, effectively eliminating the requirement of determining the cluster number and initial centroid. To evaluate the effectiveness of SCD4FL, we conducted extensive experiments on 420 faulty versions of nine benchmarks using six state-of-the-art fault localization methods and two representative CCTC detection methods. The experimental results validate the effectiveness of our method in enhancing the performance of the six fault localization methods and two CCTC detection methods, e.g., the RNN method can be improved by 53.09% under the MFR metric.
computer science, software engineering
What problem does this paper attempt to address?