An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems

Xiao-Yuan Jing,Fei Wu,Xiwei Dong,Baowen Xu
DOI: https://doi.org/10.1109/TSE.2016.2597849
2017-01-01
Abstract:Background. Solving the class-imbalance problem of within-project software defect prediction (SDP) is an important research topic. Although some class-imbalance learning methods have been presented, there exists room for improvement. For cross-project SDP, we found that the class-imbalanced source usually leads to misclassification of defective instances. However, only one work has paid attention to this cross-project class-imbalance problem. Objective. We aim to provide effective solutions for both within-project and cross-project class-imbalance problems. Method. Subclass discriminant analysis (SDA), an effective feature learning method, is introduced to solve the problems. It can learn features with more powerful classification ability from original metrics. For within-project prediction, we improve SDA for achieving balanced subclasses and propose the improved SDA (ISDA) approach. For cross-project prediction, we employ the semi-supervised transfer component analysis (SSTCA) method to make the distributions of source and target data consistent, and propose the SSTCA+ISDA prediction approach. Results. Extensive experiments on four widely used datasets indicate that: 1) ISDA-based solution performs better than other state-of-the-art methods for within-project class-imbalance problem; 2) SSTCA+ISDA proposed for cross-project class-imbalance problem significantly outperforms related methods. Conclusion. Within-project and cross-project class-imbalance problems greatly affect prediction performance, and we provide a unified and effective prediction framework for both problems.
What problem does this paper attempt to address?