Defect prediction by using cluster ensembles

Yanhong Yang,Jun Yang,Hongbing Qian
DOI: https://doi.org/10.1109/ICACI.2018.8377533
2018-01-01
Abstract:Software defect prediction becomes an active research topic in recent years and receives lots of attention. Many researches focus on within project defect prediction, which needs historical data of the project. However, in practice, there exists insufficient training data available for a new project. So cross project defect prediction (CPDP) as well as unsupervised learning defect prediction were proposed to address this problem. Generally, CPDP models use training data from other projects and predict defect proneness for modules in a particular project of interest. However, due to the different data distribution between different projects, the performance of CPDP is highly volatile. To find a better way to solve the problem on unlabeled datasets, this paper focus on unsupervised learning, and proposed a new approach, Cluster Ensembles and Labeling (CEL), to predict defect proneness for unlabeled datasets. The experiment results on 15 open source projects show that CEL model show comparable predictive power compared to supervised learning models.
What problem does this paper attempt to address?