Is Bigger Data Better for Defect Prediction - Examining the Impact of Data Size on Supervised and Unsupervised Defect Prediction.

Xinyue Liu,Yanhui Li
DOI: https://doi.org/10.1007/978-3-030-30952-7_16
2019-01-01
Abstract:Defect prediction could help software practitioners to predict the future occurrence of bugs in the software code regions. In order to improve the accuracy of defect prediction, dozens of supervised and unsupervised methods have been put forward and achieved good results in this field. One limiting factor of defect prediction is that the data size of defect data is not big, which restricts the scope of application with defect prediction models. In this study, we try to construct bigger defect datasets by merging available datasets with the same measurement dimension and check whether bigger data will bring better defect prediction performance with supervised and unsupervised models or not. The results of our experiment reveal that larger-scale dataset doesn’t bring improvements of both supervised and unsupervised classifiers.
What problem does this paper attempt to address?