Comparative Study of Ensemble Learning Methods in Just-in-time Software Defect Prediction

Ming Wen,Wenjing Zhang,Ruiqi He,Yong Li
DOI: https://doi.org/10.1109/QRS-C60940.2023.00059
2023-10-22
Abstract:Just-in-time software defect prediction (JIT-SDP) is a key way to ensure software safety and quality, and it is receiving increasing attention in the field of software engineering. However, existing JIT-SDP methods have insufficient performance and suffer from imbalanced data classes. To solve these problems, this paper proposes a just-in-time software defect prediction method with class imbalance based on Ensemble learning. Firstly, select four common Oversampling methods (RandomOverSampler, SMOTE, Borderline-SMOTE, and ADASYN) and four classic Ensemble learning methods (Bagging, Random Forest, AdaBoost, and GBDT). In this paper, each Oversampling method is combined with each Ensemble learning method to form 16 different combinations. Secondly, for each combination of raw JIT-SDP data, n imbalanced classes are performed to obtain n-balanced datasets. Then, based on the n balanced data sets, the Ensemble learning algorithm is used to train n independent JIT software defect prediction models. Finally, combine these n models to get an Ensemble learning model to predict new code changes. The experimental results show that the RandomOverSampler has more advantages in dealing with class imbalance. Compared with other combinations, its combination with the Ensemble learning method RF performs best and has better defect prediction performance. It is suitable for JIT-SDP problems in practical applications. When the number of base models is 15, it has the best model performance and computing resources.
Computer Science
What problem does this paper attempt to address?