Multi-granularity Selection of Training Data for Cross-project Defect Prediction

Yi-lu LI,Peng HE,Bing LI,Yu-tao MA
DOI: https://doi.org/10.3969/j.issn.1000-1220.2017.09.003
2017-01-01
Abstract:Cross-project defect prediction,which uses data from other projects to predict defects in a target project,provides a new per-spective to resolve the problem of limited training data encountered in traditional defect prediction. The quality of training data will di-rectly affect the performance of defect prediction,in particular for cross-project scenario. In this paper,to analyze the impact of selec-tion granularity of training data on cross-project defect prediction, we proposed a multi-granularity selection method based on the benchmarks. Then experiments on 34 datasets from the PROMISE repository were performed. The result shows that the proposed meth-od not only obtains an improved F-measure and G-measure,namely 0. 035(10. 4%) and 0. 041(9. 6%) respectively,but also reduces the number of instances actually used for training. The result also shows that Na?ve Bayes outperforms other classifiers indicated by the 44. 4% improvement of F-measure and 59. 2% of G-measure. Furthermore,the performance can be still improved by 25. 8% if the weight of instances is considered during modeling.
What problem does this paper attempt to address?