A majority affiliation based under-sampling method for class imbalance problem

Ying Xie,Xian Huang,Feng Qin,Fagen Li,Xuyang Ding
DOI: https://doi.org/10.1016/j.ins.2024.120263
IF: 8.1
2024-02-02
Information Sciences
Abstract:Class imbalance poses difficulties in training a classifier that perform well on minority classes, especially when there is a high imbalance ratio and significant class overlap. Existing data-level methods often suffer from problems like information loss and overfitting. To address these problems, we introduce a novel majority affiliation based under-sampling method (MAUS). The MAUS method employs a support vector data description model to capture the distribution of the minority class, thereby forming a hyper-sphere to establish a majority affiliation for each sample. The high-dimensional hyper-sphere constructed through all minority class samples avoids the problem of overfitting. Leveraging the majority affiliation in conjunction with the k-nearest neighbor algorithm, MAUS is capable of identifying region of class overlap and subsequently removing majority samples within these regions that negatively impact classification performance. This selective removal process minimizes excessive information loss at classification boundaries while alleviating the issue of class overlap. Furthermore, by removing those majority samples that are situated far from the classification boundary, MAUS reduces the imbalance ratio to the expected value, resulting in the attainment of a balanced dataset. To validate the effectiveness of our method, we conducted extensive experiments comparing it with state-of-the-art methods on 30 publicly available datasets. The results indicate that our approach outperforms existing methods on most of datasets and classifiers.
computer science, information systems
What problem does this paper attempt to address?