Conditional Domain Adversarial Adaptation for Heterogeneous Defect Prediction.

Lina Gong,Shujuan Jiang,Li Jiang
DOI: https://doi.org/10.1109/access.2020.3017101
IF: 3.9
2020-01-01
IEEE Access
Abstract:Heterogeneous defect prediction (HDP) has become a very active research field in software engineering, which predicts the maximum number of bug-suspiciousness modules of a target project by prediction models built on source project with heterogeneous metric set. At present, some researchers have proposed some HDP models with a promising performance. Most of existing HDP models adopted unsupervised transfer learning to map source project and target project into the same feature space, which only considered the metrics space, not the label information from source project and few part of target project. Meanwhile, the predictive ability of these HDP models in effort-aware context have not been compared. Therefore, we set up to investigate the effectiveness of label information on HDP, and to propose a HDP model for improving the predicting performance in classification and effort-aware contexts. In order to use these label information, we propose a novel conditional domain adversarial adaptation (CDAA) approach to tackle heterogeneous problem in SDP, which is motivated by generative adversarial networks (GANs). There are three networks in architecture of our CDAA, including one generator, one discriminator and one classifier. The generator learns how to transfer source instance space to target instance space. The discriminator learns how to identify the fake instances generated by generator. The classifier learns how to correctly classify the label of instances. In our CDAA, the loss function of classifier and discriminator are both back propagate to generator. Then, to ensure a fair comparison between state-of-the art methods and CDAA, we take AUC, MCC and $P_{opt}$ as measures to evaluate 28 open-source projects. Experimental results demonstrate that CDAA method could take advantage of label information to effectively map source project to target project and improve the predictive performance. Also, experimental results demonstrate that our CDAA method is not affected by the number of same metrics between source project and target project.
What problem does this paper attempt to address?