Abstract:Software defect prediction aims to predict defect-prone code regions automatically before defects are discovered. Accurate prediction helps software practitioners to prioritize their testing efforts. In recent decades, dozens of approaches have been put forward and acquired good results in this field. However, in practical scenarios, many projects have limited labeled instances; more than that, most of these labeled instances are nondefective. The lack of training data and class imbalance problem together bring serious challenges to software defect prediction tasks. So far, few of prevailing approaches can well handle these two difficulties simultaneously. One important reason is that they do not pay adequate attention to several key instances, which are difficult to classify in a small imbalanced dataset. This article introduces the concept of "instance hardness" to integrate various difficulties of imbalance classification tasks. Based on it, a novel imbalance learning framework named self-paced ensemble of ensembles (SPE<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="2.343ex" style="vertical-align: -0.171ex;" viewBox="0 -934.9 453.9 1008.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="513"></use></g></svg></span>) is proposed to perform software defect prediction. SPE<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="2.343ex" style="vertical-align: -0.171ex;" viewBox="0 -934.9 453.9 1008.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="513"></use></g></svg></span> aims to generate a strong ensemble of ensembles by self-paced harmonizing instance hardness via undersampling. Finally, SPE<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="2.343ex" style="vertical-align: -0.171ex;" viewBox="0 -934.9 453.9 1008.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="513"></use></g></svg></span> is extensively compared with eight imbalance learning approaches on ten open-source defect datasets. Experiments indicate that SPE<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="1.054ex" height="2.343ex" style="vertical-align: -0.171ex;" viewBox="0 -934.9 453.9 1008.6" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"> <use transform="scale(0.707)" xlink:href="#MJMAIN-32" x="0" y="513"></use></g></svg></span> improves the performance and achieves better and more significant F-measure values than its existing counterparts, based on Brunner's statistical significance test and Cliff's effect sizes.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"><path stroke-width="1" id="MJMAIN-32" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"></path></defs></svg>

SDPERL: A Framework for Software Defect Prediction Using Ensemble Feature Extraction and Reinforcement Learning

Enhancing software defect prediction: a framework with improved feature selection and ensemble machine learning

TLEL: A Two-Layer Ensemble Learning Approach for Just-in-time Defect Prediction

Software Defect Prediction Using an Intelligent Ensemble-Based Model

Deep Learning for Just-In-Time Defect Prediction

SPE$^{2}$: Self-Paced Ensemble of Ensembles for Software Defect Prediction

A novel defect prediction method based on semantic feature enhancement

Software Defect Prediction Using Deep Q‐Learning Network‐Based Feature Extraction

An Approach to Semantic and Structural Features Learning for Software Defect Prediction

A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction

EFSPredictor: Predicting Configuration Bugs with Ensemble Feature Selection.

Optimized Deeplearning Algorithm for Software Defects Prediction

Software visualization and deep transfer learning for effective software defect prediction

Improving Software Defect Prediction With a Combination of Feature Selection Based On Ant Colony Optimization and Ensemble Technique

Software defect prediction employing BiLSTM and BERT-based semantic feature

A hybrid‐ensemble model for software defect prediction for balanced and imbalanced datasets using AI‐based techniques with feature preservation: SMERKP‐XGB

Deep Semantic Feature Learning with Embedded Static Metrics for Software Defect Prediction

Software defect prediction based on nested-stacking and heterogeneous feature selection

Ensemble Machine Learning Paradigms in Software Defect Prediction

ELM and KELM based software defect prediction using feature selection techniques

A Software Defect Prediction Approach Based on Hybrid Feature Dimensionality Reduction