Unveiling the Impact of Unchanged Modules Across Versions on the Evaluation of Within-Project Defect Prediction Models

Xutong Liu,Yufei Zhou,Zeyu Lu,Yuanqing Mei,Yibiao Yang,Junyan Qian,Yuming Zhou
DOI: https://doi.org/10.1002/smr.2715
2024-01-01
Abstract:BackgroundSoftware defect prediction (SDP) is a topic actively researched in the software engineering community. Within-project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.ProblemData duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.MethodIn this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.ResultsThe experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.ConclusionWe recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models. We provide a method to detect and remove duplicate modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation. The experiments provide evidence that data duplication significantly affects the reported performance values of individual WPDP models. We recommend that future WPDP studies take into consideration the removal of duplicate modules from target versions when evaluating the performance of their models to enhance the reliability and validity of the results obtained in WPDP research. image
What problem does this paper attempt to address?