Research of software fault prediction based on PU learning

He Zhang,Mei Li,Yang Zhang,Xiaoyan Cai
DOI: https://doi.org/10.3969/j.issn.1001-3695.2015.11.028
2015-01-01
Abstract:The software fault datasets were highly possible that there were only a small set of labeled positive data and most of the data was hard to be labeled,which contained a great deal of useful information for building a prediction model for software fault detection.This paper proposed a semi-supervised classification model to predict the faults only using the positive and unla-beled data during the software development process.The proposed method firstly used the SMOTE (synthetic minority over-sampling technique)method to balance the class distribution by oversampling on the rare positive dataset.Then partitioned the improved dataset into positive subset and unlabeled subset properly.Third used the POSC 4.5 algorithm and Bagging algorithm to build a decision tree classification ensemble model for software fault prediction using these subsets.The experiments were conducted on 12 datasets from the NASA MDP database.The experiment results show that the fault detection rate based on positive and unlabeled learning is close to the supervised learning method.The ensemble classifier method can effectively im-prove detective performance than a single classifier method,and the unlabeled level can effect the fault detection somehow.
What problem does this paper attempt to address?