Kernel Spectral Embedding Transfer Ensemble for Heterogeneous Defect Prediction.

Haonan Tong,Bin Liu,Shihai Wang
DOI: https://doi.org/10.1109/tse.2019.2939303
IF: 7.4
2019-01-01
IEEE Transactions on Software Engineering
Abstract:Cross-project defect prediction (CPDP) refers to predicting defects in the target project lacking of defect data by using prediction models trained on the historical defect data of other projects (i.e., source data). However, CPDP requires the source and target projects have common metric set (CPDP-CM). Recently, heterogeneous defect prediction (HDP) has drawn the increasing attention, which predicts defects across projects having heterogeneous metric sets. However, building high-performance HDP methods remains a challenge owing to several serious challenges including class imbalance problem, nonlinear, and the distribution differences between source and target datasets. In this paper, we propose a novel kernel spectral embedding transfer ensemble (KSETE) approach for HDP. KSETE first addresses the class-imbalance problem of the source data and then tries to find the latent common feature space for the source and target datasets by combining kernel spectral embedding, transfer learning, and ensemble learning. Experiments are performed on 22 public projects in both HDP and CPDP-CM scenarios in terms of multiple well-known performance measures such as, AUC, G-Measure, and MCC. The experimental results show that (1) KSETE improves the performance over previous HDP methods by at least 22.7, 138.9, and 494.4 percent in terms of AUC, G-Measure, and MCC, respectively. (2) KSETE improves the performance over previous CPDP-CM methods by at least 4.5, 30.2, and 17.9 percent in AUC, G-Measure, and MCC, respectively. It can be concluded that the proposed KSETE is very effective in both the HDP scenario and the CPDP-CM scenario.
What problem does this paper attempt to address?