Abstract:Context: Software defect prediction (SDP) is an important challenge in the field of software engineering, hence much research work has been conducted, most notably through the use of machine learning algorithms. However, class-imbalance typified by few defective components and many non-defective ones is a common occurrence causing difficulties for these methods. Imbalanced learning aims to deal with this problem and has recently been deployed by some researchers, unfortunately with inconsistent results. Objective: We conduct a comprehensive experiment to explore (a) the basic characteristics of this problem; (b) the effect of imbalanced learning and its interactions with (i) data imbalance, (ii) type of classifier, (iii) input metrics and (iv) imbalanced learning method. Method: We systematically evaluate 27 data sets, 7 classifiers, 7 types of input metrics and 17 imbalanced learning methods (including doing nothing) using an experimental design that enables exploration of interactions between these factors and individual imbalanced learning algorithms. This yields 27 x7; 7 x7; 7 x7; 17 = 22491 results. The Matthews correlation coefficient (MCC) is used as an unbiased performance measure (unlike the more widely used F1 and AUC measures). Results: (a) we found a large majority (87 percent) of 106 public domain data sets exhibit moderate or low level of imbalance (imbalance ratio 003C;10; median = 3.94); (b) anything other than low levels of imbalance clearly harm the performance of traditional learning for SDP; (c) imbalanced learning is more effective on the data sets with moderate or higher imbalance, however negative results are always possible; (d) type of classifier has most impact on the improvement in classification performance followed by the imbalanced learning method itself. Type of input metrics is not influential. (e) only ${\sim} 52\%$similar to 52% of the combinations of Imbalanced Learner and Classifier have a significant positive effect. Conclusion: This paper offers two practical guidelines. First, imbalanced learning should only be considered for moderate or highly imbalanced SDP data sets. Second, the appropriate combination of imbalanced method and classifier needs to be carefully chosen to ameliorate the imbalanced learning problem for SDP. In contrast, the indiscriminate application of imbalanced learning can be harmful.

An Experimental Evaluation of Imbalanced Learning and Time-Series Validation in the Context of CI/CD Prediction

A novel lifelong machine learning-based method to eliminate calibration drift in clinical prediction models.

The Why, When, What, and How about Predictive Continuous Integration: A Simulation-Based Investigation

A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis

Revisiting Machine Learning based Test Case Prioritization for Continuous Integration

Could We Predict the Result of a Continuous Integration Build? An Empirical Study

Empirical Analysis on CI/CD Pipeline Evolution in Machine Learning Projects

Cutting the Software Building Efforts in Continuous Integration by Semi-Supervised Online AUC Optimization.

How do Machine Learning Projects use Continuous Integration Practices? An Empirical Study on GitHub Actions

Confidence Interval Estimation of Predictive Performance in the Context of AutoML

Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores

Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment

Impact of train/test sample regimen on performance estimate stability of machine learning in cardiovascular imaging

Continual learning on deployment pipelines for Machine Learning Systems

A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development

Iterative Metric Learning for Imbalance Data Classification

An empirical study of testing machine learning in the wild

Commit-time defect prediction using one-class classification

A Machine Learning Approach to Improve the Detection of CI Skip Commits

A Comprehensive Investigation of the Role of Imbalanced Learning for Software Defect Prediction

Detecting Continuous Integration Skip : A Reinforcement Learning-based Approach