Revisiting heterogeneous defect prediction methods: How far are we?

Xiang Chen,Yanzhou Mu,Ke Liu,Zhanqi Cui,Chao Ni
DOI: https://doi.org/10.1016/j.infsof.2020.106441
IF: 3.9
2021-02-01
Information and Software Technology
Abstract:<p>Context: Cross-project defect prediction applies to the scenarios that the target projects are new projects. Most of the previous studies tried to utilize the training data from other projects (i.e., the source projects). However, metrics used by practitioners to measure the extracted program modules from different projects may not be the same, and performing heterogeneous defect prediction (HDP) is challenging.</p><p>Objective: Researchers have proposed many novel HDP methods with promising performance until now. Recently, unsupervised defect prediction (UDP) methods have received more attention and show competitive performance. However, to our best knowledge, whether HDP methods can perform significantly better than UDP methods has not yet been thoroughly investigated.</p><p>Method: In this article, we perform a comparative study to have a holistic look at this issue. Specifically, we compare five HDP methods with four UDP methods on 34 projects in five groups under the same experimental setup from three different perspectives: non-effort-aware performance indicators (NPIs), effort-aware performance indicators (EPIs) and diversity analysis on identifying defective modules.</p><p>Result: We have the following findings: (1) HDP methods do not perform significantly better than some of UDP methods in terms of two NPIs and four EPIs. (2) According to two satisfactory criteria recommended by previous studies, the satisfactory ratio of existing HDP methods is pessimistic. (3) The diversity of prediction for defective modules across HDP <em>vs</em>. UDP methods is more than that within HDP methods or UDP methods.</p><p>Conclusion: The above findings implicate there is still a long way for the HDP issue to go. Given this, we present some observations about the road ahead for HDP.</p>
computer science, information systems, software engineering
What problem does this paper attempt to address?