Software defect prediction using semi-supervised support vector machine with sampling

Shengping LIAO,Ling XU,Meng YAN
DOI: https://doi.org/10.3778/j.issn.1002-8331.1601-0447
2017-01-01
Abstract:Software defect prediction is helpful to improve the quality of software and effectively allocate test resources. To tackle two practical yet important issues in software defect prediction:labeled data is hard to be collected and class imbalance, a sample based semi-supervised support vector machine method is proposed. This method uses an unsuper-vised sample approach to sample a small percentage of modules to be tested and labeled, and this sample method can ensure that the defect instances in training sets are not too few. Semi-supervised support vector machine algorithm uses few labeled data combined with unlabeled to build predictor so that the model can exploit the information of unlabeled data. In the evaluation on four NASA projects, the experimental results show that the proposed approach achieves compa-rable performance compared with supervised learning models, but uses little defect information. Moreover, proposed method's performance is better than other semi-supervised learning methods in terms of recall and F-measure.
What problem does this paper attempt to address?