Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning

Riyaz Sikora,Yoon Sang Lee
DOI: https://doi.org/10.1007/s10796-024-10533-7
2024-09-02
Information Systems Frontiers
Abstract:Imbalanced data sets are a growing problem in data mining and business analytics. However, the ability of machine learning algorithms to predict the minority class deteriorates in the presence of class imbalance. Although there have been many approaches that have been studied in literature to tackle the imbalance problem, most of these approaches have been met with limited success. In this study, we propose three methods based on a wrapper approach that combine the use of under-sampling with ensemble learning to improve the performance of standard data mining algorithms. We test our ensemble methods on 10 data sets collected from the UCI repository with an imbalance ratio of at least 70%. We compare their performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the recall, AUROC, and the average of precision and recall.
computer science, information systems, theory & methods
What problem does this paper attempt to address?