An Improved Transfer Adaptive Boosting Approach for Mixed‐project Defect Prediction

Lina Gong,Shujuan Jiang,Li Jiang
DOI: https://doi.org/10.1002/smr.2172
2019-01-01
Journal of Software
Abstract:Software defect prediction (SDP) has been a very important research topic in software engineering, since it can provide high-quality results when given sufficient historical data of the project. Unfortunately, there are not abundant data to bulid the defect prediction model at the beginning of a project. For this scenario, one possible solution is to use data from other projects in the same company. However, using these data practically would get poor performance because of different distributional characteristics among projects. Also, software has more non-defective instances than defective instances that may cause a significant bias towards defective instances. Considering these two problems, we propose an improved transfer adaptive boosting (ITrAdaBoost) approach for being given a small number of labeled data in the testing project. In our approach, ITrAdaBoost can not only employ the Matthews correlation coefficient (MCC) as the measure instead of accuracy rate but also use the asymmetric misclassification costs for non-defective and defective instances. Extensive experiments on 18 public projects from four datasets indicate that: (a) our approach significantly outperforms state-of-the-art cross-project defect prediction (CPDP) approaches, and (b) our approach can obtain comparable prediction performances in contrast with within project prediction results. Consequently, the proposed approach can build an effective prediction model with a small number of labeled instances for mixed-project defect prediction (MPDP).
What problem does this paper attempt to address?