Defect Prediction on Unlabeled Datasets by Using Unsupervised Clustering

Jun Yang,Hongbing Qian
DOI: https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0073
2016-01-01
Abstract:Defect prediction on unlabeled datasets is one of the most active research areas in software engineering. Generally, cross-project defect prediction (CPDP) and unsupervised learning defect prediction are utilized to address this problem. The fundamental idea of CPDP is the transfer learning that reuses the prediction model built by labeled source projects. However, because of the difference of data distribution among projects, the prediction performance of CPDP models varies by projects. Usually, unsupervised learning models are not comparable to supervised learning ones in term of prediction performance. Hence, many unsupervised prediction models require manual effort to achieve good prediction performance. Recently, a novel unsupervised learning approach, which is without manual effort and based upon the magnitude of metric value, has been proposed and got good prediction performance on some datasets. With the heuristic of this approach, this paper proposes a new approach for predicting defect proneness on unlabeled datasets-ACL. In our empirical study on 16 open source projects, the ACL models led prediction performance on 9 datasets in term of F-measure, which are comparable to supervised learning models in term of predictive power.
What problem does this paper attempt to address?