Abstract:BackgroundSoftware defect prediction (SDP) is a topic actively researched in the software engineering community. Within-project defect prediction (WPDP) involves using labeled modules from previous versions of the same project to train classifiers. Over time, many defect prediction models have been evaluated under the WPDP scenario.ProblemData duplication poses a significant challenge in current WPDP evaluation procedures. Unchanged modules, characterized by identical executable source code, are frequently present in both target and source versions during experimentation. However, it is still unclear how and to what extent the presence of unchanged modules affects the performance assessment of WPDP models and the comparison of multiple WPDP models.MethodIn this paper, we provide a method to detect and remove unchanged modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation.ResultsThe experiments conducted on 481 target versions from 62 projects provide evidence that data duplication significantly affects the reported performance values of individual learners in WPDP. However, when ranking multiple WPDP models based on prediction performance, the impact of removing unchanged instances is not substantial. Nevertheless, it is important to note that removing unchanged instances does have a slight influence on the selection of models with better generalization.ConclusionWe recommend that future WPDP studies take into consideration the removal of unchanged modules from target versions when evaluating the performance of their models. This practice will enhance the reliability and validity of the results obtained in WPDP research, leading to improved understanding and advancements in defect prediction models. We provide a method to detect and remove duplicate modules from defect datasets and unveil the impact of data duplication in WPDP on model evaluation. The experiments provide evidence that data duplication significantly affects the reported performance values of individual WPDP models. We recommend that future WPDP studies take into consideration the removal of duplicate modules from target versions when evaluating the performance of their models to enhance the reliability and validity of the results obtained in WPDP research. image

Towards a Framework for Reliable Performance Evaluation in Defect Prediction

Toward a consistent performance evaluation for defect prediction models

Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning

HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction.

Unveiling the Impact of Unchanged Modules Across Versions on the Evaluation of Within-Project Defect Prediction Models

Software defect prediction based on nested-stacking and heterogeneous feature selection

A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction

Revisiting heterogeneous defect prediction methods: How far are we?

Effort-aware Just-in-time Defect Prediction: Simple Unsupervised Models Could Be Better Than Supervised Models.

A Software Defect Prediction Approach Based on Hybrid Feature Dimensionality Reduction

Compressed C4.5 Models for Software Defect Prediction

Performance evaluation of software defect prediction with NASA dataset using machine learning techniques

Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning

Local Versus Global Models for Just-In-Time Software Defect Prediction

An Empirical Study on Heterogeneous Defect Prediction Approaches

Enhancing Defect Prediction with Static Defect Analysis.

The Probabilistic Bounds on the Feasibility of the Defect Prediction Models in Real-World Testing Environments

Multi-project Regression Based Approach for Software Defect Number Prediction

A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development

SDP-MTF: A Composite Transfer Learning and Feature Fusion for Cross-Project Software Defect Prediction