Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction

Ji-Yuan HE,Zhao-Peng MENG,Xiang CHEN,Zan WANG,Xiang-Yu FAN
DOI: https://doi.org/10.13328/j.cnki.jos.005228
2017-01-01
Journal of Software
Abstract:Software defect prediction can help developers to optimize the distribution of test resources by predicting whether or not a software module is defect-prone.Most defect prediction researches focus on within-project defect prediction which needs sufficient training data from the same project.However,in real software development,a project which needs defect prediction is always new or without any historical data.Therefore cross-project defect prediction becomes a hot topic which uses training data from several projects and performs prediction on another one.The main research challenges in cross-project defect prediction are the variety of distribution from source project to target project and class imbalance problem among datasets.Inspired by search based software engineering,this paper proposes a search based semi-supervised ensemble learning approach S3EL.By adjusting the ratio of distribution in training dataset,several Na(i)ve Bayes classifiers are built as the base learners,then a small amount of labeled target instances and genetic algorithm are used to combine these base classifiers as a final prediction model.S3EL is compared with other up-to-date classical cross-project defect prediction approaches (such as Burak filter,Peters filter,TCA+,CODEP and HYDRA) on AEEEM and Promise dataset.Final results show that S3EL has the best prediction performance in most cases under the F1 measure.
What problem does this paper attempt to address?