Abstract:Just-in-Time (JIT) defect prediction-a technique which aims to predict bugs at change level-has been paid more attention. JIT defect prediction leverages the SZZ approach to identify bug-introducing changes. Recently, researchers found that the performance of SZZ (including its variants) is impacted by a large amount of noise. SZZ may considerably mislabel changes that are used to train a JIT defect prediction model, and thus impact the prediction accuracy. In this paper, we investigate the impact of the mislabeled changes by different SZZ variants on the performance and interpretation of JIT defect prediction models. We analyze four SZZ variants (i.e., B-SZZ, AG-SZZ, MA-SZZ, and RA-SZZ) that are proposed by prior studies. We build the prediction models using the labeled data by these four SZZ variants. Among the four SZZ variants, RA-SZZ is least likely to generate mislabeled changes, and we construct the testing set by using RA-SZZ. All of the four prediction models are then evaluated on the same testing set. We choose the prediction model built on the labeled data by RA-SZZ as the baseline model, and we compare the performance and metric importance of the models trained using the labeled data by the other three SZZ variants with the baseline model. Through a large-scale empirical study on a total of 126,526 changes from ten Apache open source projects, we find that in terms of various performance measures (AUC, F1-score, G-mean and Recall@20%), the mislabeled changes by B-SZZ and MA-SZZ are not likely to cause a considerable performance reduction, while the mislabeled changes by AG-SZZ cause a statistically significant performance reduction with an average difference of 1-5 percent. When considering developers' inspection effort (measured by LOC) in practice, the changes mislabeled B-SZZ and AG-SZZ lead to 9-10 and 1-15 percent more wasted inspection effort, respectively. And the mislabeled changes by B-SZZ lead to significantly more wasted effort. The mislabeled changes by MA-SZZ do not cause considerably more wasted effort. We also find that the top-most important metric for identifying bug-introducing changes (i.e., number of files modified in a change) is robust to the mislabeling noise generated by SZZ. But the second- and third-most important metrics are more likely to be impacted by the mislabeling noise, unless random forest is used as the underlying classifier.

Effort-aware Just-in-time Defect Prediction: Simple Unsupervised Models Could Be Better Than Supervised Models.

Effort-Aware semi-Supervised just-in-Time defect prediction

Unifying Defect Prediction, Categorization, and Repair by Multi-Task Deep Learning

Effort-Aware Tri-Training for Semi-supervised Just-in-Time Defect Prediction.

Code Churn: A Neglected Metric in Effort-Aware Just-in-Time Defect Prediction

Revisiting Unsupervised Learning for Defect Prediction

Deep Learning for Just-In-Time Defect Prediction

DEJIT: A Differential Evolution Algorithm for Effort-Aware Just-in-Time Software Defect Prediction

Effort-aware and just-in-time defect prediction with neural network

Just-in-time Defect Prediction for Software Hunks

A systematic review of unsupervised learning techniques for software defect prediction

An Improved Semi-Supervised Learning Method for Software Defect Prediction.

A Systematic Survey of Just-in-Time Software Defect Prediction

A Differential Evolution-Based Approach for Effort-Aware Just-in-time Software Defect Prediction

Toward a consistent performance evaluation for defect prediction models

Interpretability application of the Just-in-Time software defect prediction model

The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect Prediction.

A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development

A code change‐oriented approach to just‐in‐time defect prediction with multiple input semantic fusion

Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction

FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction