Training Data Selection for Cross-Project Defection Prediction: Which Approach is Better?

Yi Bin,Kai Zhou,Hongmin Lu,Yuming Zhou,Baowen Xu
DOI: https://doi.org/10.1109/esem.2017.49
2017-01-01
Abstract:Background: Many relevancy filters have been proposed to select training data for building cross-project defect prediction (CPDP) models. However, up to now, there is no consensus about which relevancy filter is better for CPDP. Goal: In this paper, we conduct a thorough experiment to compare nine relevancy filters proposed in the recent literature. Method: Based on 33 publicly available data sets, we compare not only the retaining ratio of the original training data and the overlapping degree among the retained data but also the prediction performance of the resulting CPDP models under the ranking and classification scenarios. Results: In terms of retaining ratio and overlapping degree, there are important differences among these filters. According to the defect prediction performance, global filter always stays in the first level. Conclusions: For practitioners, it appears that there is no need to filter source project data, as this may lead to better defect prediction results.
What problem does this paper attempt to address?