Combining Coverage and Expert Features with Semantic Representation for Coincidental Correctness Detection

Huan Xie,Yan Lei,Maojin Li,Meng Yan,Sheng Zhang
DOI: https://doi.org/10.1145/3691620.3695542
2024-01-01
Abstract:Coincidental correctness (CC) can be misleading for developers because it gives the impression that the code is functioning correctly when there are hidden faults. To mitigate the negative impacts of CC test cases, extensive research has been conducted on their detection, employing either coverage-based or expert-based features. These studies have yielded promising results. Coverage and expert features each provide unique insights into program execution, yet the literature has not fully explored the combined potential of these two feature sets to enhance the detection of CC. Additionally, the rich semantics of the test code and focal method have not been fully utilized. Therefore, we propose to build a unified model, CORE, that integrates coverage and expert features with semantic representations of test and focal methods to improve the detection of CC test cases. We make a comprehensive evaluation with six state-of-the-art baselines on the widely-used Defects4J benchmark. The experimental results show that CORE outperforms the baselines in terms of CC detection accuracy, with a substantial improvement (i.e., 40% improvement on average in terms of F1 score). Then, we conduct the ablation experiment to show that the coverage, expert, and semantics contribute to CORE. CORE can also improve the effectiveness of spectrum-based and mutation-based fault localization performance (e.g., 50% improvements for spectrum-based formula Dstar and 44% improvements for mutation-based method MUSE under relabeling strategy).
What problem does this paper attempt to address?