Adversarial Domain Adaptation for Cross-Project Defect Prediction

Hengjie Song,Guobin Wu,Le Ma,Yufei Pan,Qingan Huang,Siyu Jiang
DOI: https://doi.org/10.1007/s10664-023-10371-2
IF: 3.762
2023-01-01
Empirical Software Engineering
Abstract:Cross-Project Defect Prediction (CPDP) is an attractive topic for locating defects in projects with little labeled data (target projects) by using the prediction model from other projects with sufficient data (source projects). However, previous models may not fully capture the semantic features of programs because of inappropriate feature extraction models. Besides, researchers may fail to consider the relationship between the decision boundary and target project data when matching two feature distributions by adopting transfer learning methods, which would lead to the misclassification of target samples that are near boundary. To handle these drawbacks, we propose a novel Adversarial Domain Adaptation (ADA) model for CPDP. Specifically, we leverage a Long Short-Term Memory network with attention mechanism to extract semantic features that better represent programs. Then, we train two classifiers to correctly categorize source samples and distinguish ambiguous target instances that influence prediction accuracy. Next, we treat the classifiers as a discriminator and feature extraction model as a generator, and train them based on adversarial learning methods to depict the desired relationship. As the classifiers know this relationship, they should attain better performance. Extensive experiments on two benchmark datasets are conducted to verify the effectiveness of the proposed ADA methods. Experimental and statistical results show that ADA significantly outperforms other state-of-the-art baseline methods.
What problem does this paper attempt to address?