SDP-MTF: A Composite Transfer Learning and Feature Fusion for Cross-Project Software Defect Prediction

Tianwei Lei,Jingfeng Xue,Duo Man,Yong Wang,Minghui Li,Zixiao Kong
DOI: https://doi.org/10.3390/electronics13132439
IF: 2.9
2024-06-22
Electronics
Abstract:Software defect prediction is critical for improving software quality and reducing maintenance costs. In recent years, Cross-Project software defect prediction has garnered significant attention from researchers. This approach leverages transfer learning to apply the knowledge from existing projects to new ones, thereby enhancing the universality of predictive models. It provides an effective solution for projects with limited historical defect data. Nevertheless, current methodologies face two main challenges: first, the inadequacy of feature information mining, where code statistical information or semantic information is used in isolation, ignoring the benefits of their integration; second, the substantial feature disparity between different projects, which can lead to insufficient effect during transfer learning, necessitating additional efforts to narrow this gap to improve precision. Addressing these challenges, this paper proposes a novel methodology, SDP-MTF (Software Defect Prediction using Multi-stage Transfer learning and Feature fusion), that combines code statistical features, deep semantic features, and multiple feature transfer learning methods to enhance the predictive effect. The SDP-MTF method was empirically tested on single-source cross-project software defect prediction across six projects from the PROMISE dataset, benchmarked against five baseline algorithms that employ distinct features and transfer methodologies. Our findings indicate that SDP-MTF significantly outperforms five classical baseline algorithms, improving the F1-Score by 8% to 15.2%, thereby substantively advancing the precision of cross-project software defect prediction.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses two core issues in cross-project software defect prediction: 1. **Insufficient Feature Information Mining**: Current methods often use either code statistical features or semantic features alone, neglecting the potential advantages of combining both. Code statistical features and deep semantic features provide information about the source code from different perspectives, and their integration can significantly improve prediction accuracy. 2. **Significant Feature Differences**: The development processes, programming languages, and coding habits of different projects lead to significant differences in data features, making it difficult to directly apply existing knowledge in transfer learning. Therefore, it is necessary to reduce these differences to improve prediction accuracy. To solve the above problems, the authors propose the SDP-MTF (Cross-Project Software Defect Prediction based on Multi-Stage Transfer Learning and Feature Fusion) method, which enhances prediction performance by integrating code statistical features, deep semantic features, and various feature transfer learning techniques. Specifically, the method first uses a feature constructor to extract code statistical features and deep semantic features and integrates them into a comprehensive feature set; then, it employs a composite transfer learning method (combining TCA+ and domain adaptation networks) to reduce the distribution gap between the source domain and the target domain; finally, the transformed features are input into a classifier for the final classification task, thereby achieving cross-project software defect prediction. Experimental results show that the SDP-MTF method improves the F1-Score metric by 8% to 15.2% compared to five baseline algorithms, significantly enhancing the accuracy of cross-project software defect prediction.