An Enhanced Tree Ensemble for Classification in the Presence of Extreme Class Imbalance

Samir K. Safi,Sheema Gul
DOI: https://doi.org/10.3390/math12203243
IF: 2.4
2024-10-17
Mathematics
Abstract:Researchers using machine learning methods for classification can face challenges due to class imbalance, where a certain class is underrepresented. Over or under-sampling of minority or majority class observations, or solely relying on model selection for ensemble methods, may prove ineffective when the class imbalance ratio is extremely high. To address this issue, this paper proposes a method called enhance tree ensemble (ETE), based on generating synthetic data for minority class observations in conjunction with tree selection based on their performance on the training data. The proposed method first generates minority class instances to balance the training data and then uses the idea of tree selection by leveraging out-of-bag (ETEOOB) and sub-samples (ETESS) observations, respectively. The efficacy of the proposed method is assessed using twenty benchmark problems for binary classification with moderate to extreme class imbalance, comparing it against other well-known methods such as optimal tree ensemble (OTE), SMOTE random forest (RFSMOTE), oversampling random forest (RFOS), under-sampling random forest (RFUS), k-nearest neighbor (k-NN), support vector machine (SVM), tree, and artificial neural network (ANN). Performance metrics such as classification error rate and precision are used for evaluation purposes. The analyses of the study revealed that the proposed method, based on data balancing and model selection, yielded better results than the other methods.
mathematics
What problem does this paper attempt to address?