Heterogeneous Defect Prediction Via Exploiting Correlation Subspace

Ming Cheng,Guoqing Wu,Min Jiang,Hongyan Wan,Guoan You,Mengting Yuan
DOI: https://doi.org/10.18293/seke2016-090
2016-01-01
Abstract:Software defect prediction generally builds models from intra-project data.Lack of training data at the early stage of software testing limits the efficiency of prediction in practice.Thereby researchers proposed cross-project defect prediction using the data from other projects.Most previous efforts assumed the cross-project defect data have the same metrics set which means the metrics used and size of metrics set are same in the data of projects.However, in real scenarios, this assumption may not hold.In addition, software defect datasets have the class imbalance problem increasing the difficulty for the learner to predict defects.In this paper, we advance canonical correlation analysis for deriving a joint feature space for associating crossproject data and propose a novel support vector machine algorithm which incorporates the correlation transfer information into classifier design for cross-project prediction.Moreover, we take different misclassification costs into consideration to make the classification inclining to classify a module as a defective one, alleviating the impact of imbalanced data.Experiments on public heterogeneous datasets from different projects show that our method is more effective, compared to state-of-the-art methods.
What problem does this paper attempt to address?