Abstract:Bug fixing is one of the most important activities in software development and maintenance. Bugs are reported, recorded, and managed in bug tracking systems such as Bugzilla. In general, a bug report contains many fields, such as product, component, severity, priority, fixer, operating system (OS), and platform, which provide important information for the bug triaging and fixing process. Our previous study finds that approximately 80% of bug reports have their fields reassigned and refined at least once, and bugs with reassigned and refined fields take more time to fix than bugs with no reassigned and refined fields. Thus, automatically predicting which bug report fields get reassigned and refined could help developers to save bug fixing time. Considering that a bug report could have multiple field re-assignments and refinements (e.g., the product, component, fixer, and other fields of a bug report can get reassigned and refined), in this paper, we propose a multi-label learning algorithm to predict which bug report fields might be reassigned and refined. We note that the number of bug reports with some types of reassignment and refinement (e.g., bugs whose severity fields gets reassigned and refined) is a small proportion of the whole bug report collection, indicating the class imbalance problem. Thus, we propose imbalanced ML.KNN (Im-ML.KNN), which extends ML.KNN, one of the state-of-the-art multi-label learning algorithms, to achieve better performance. Im-ML.KNN is a composite model that combines 3 multi-label classifiers built using different types of features (i.e., meta, textual, and mixed features). We evaluate our solution on 4 large bug report datasets including OpenOffice, Netbeans, Eclipse, and Mozilla containing a total of 190,558 bug reports. We show that Im-ML.KNN can achieve an average F-measure score of 0.56-0.62. We also compare Im-ML.KNN with other state-of-art methods, such as the method proposed by Lamkanfi et al., ML.KNN, and HOMER-NB. The results show that Im-ML.KNN, on average, improves the average F-measure scores of Lamkanfi et al.'s method, ML.KNN, and HOMER-NB by 119.69%, 9.11%, and 161.08%, respectively.

SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE

Automated Bug Report Field Reassignment and Refinement Prediction

A Novel Hybrid Sampling Framework for Imbalanced Learning

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Over-sampling algorithm for imbalanced data classification

SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling

DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

Learning class-imbalanced data with region-impurity synthetic minority oversampling technique

SMOTE: Synthetic Minority Over-sampling Technique

PF-SMOTE: A Novel Parameter-Free SMOTE for Imbalanced Datasets

An oversampling FCM-KSMOTE algorithm for imbalanced data classification

SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis

A Classification Method for Imbalanced Data Based on Smote and Fuzzy Rough Nearest Neighbor Algorithm

CBReT: A Cluster-Based Resampling Technique for dealing with imbalanced data in code smell prediction

ISMOTE: A More Accurate Alternative for SMOTE

A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data

FCM-CSMOTE: Fuzzy C-Means Center-SMOTE