A Cross-Project Defect Prediction Approach Based on Code Semantics and Cross-Version Structural Information
Yifan Zou,Huiqiang Wang,Hongwu Lv,Shuai Zhao,Haoye Tian
DOI: https://doi.org/10.1142/s0218194024500165
IF: 1.007
2024-01-01
International Journal of Software Engineering and Knowledge Engineering
Abstract:Context: Cross-project defect prediction (CPDP), due to the potential of adaption by industry in realistic scenarios, had gained significant attention from the research community. Currently, existing CPDP studies use static statistical features designed by experts, which might not capture the semantic and structural aspects of software, resulting in low accuracy in defect prediction. Meanwhile, they tend to overlook the valuable iterative information brought about by version updates in mature software projects.Objective: This paper introduces DETECTOR, a novel CPDP approach based on coDE semanTic and cross-vErsion struCTural infORmation to leverage cross-versions features of the software and improve the performance of CPDP.Methods: DETECTOR parses source code to exploit Abstract Syntax Trees (ASTs) and cross-version software network (Cross-SN) that consists of internal class dependency network and cross-version class dependency edges. It utilizes Attention-based Bi-LSTM and simplified graph convolutional neural networks to automatically extract software features from ASTs and Cross-SN. The extracted features are fused using gate(& sdot;) to generate more effective cross-version features. Finally the source project is selected to carry out the data used to train the classifier to predict the defects.Results: Empirical studies on seven open-source Java projects, the experiment results show that: (1) DETECTOR outperforms the state-of-the-art models in CPDP; (2) our proposed cross-version dependent edges positively contribute to DETECTOR performance; (3) gate(& sdot;) outperforms existing strategies in fusion features; (4) more multi-versions information enhance DETECTOR's performance.Conclusion: DETECTOR can predict more defects in CPDP and improve the accuracy and effectiveness of prediction.
What problem does this paper attempt to address?