Oversampling With Reliably Expanding Minority Class Regions for Imbalanced Data Learning
Tuanfei Zhu,Xinwang Liu,En Zhu
DOI: https://doi.org/10.1109/tkde.2022.3171706
IF: 9.235
2023-05-02
IEEE Transactions on Knowledge and Data Engineering
Abstract:This paper proposes a simple interpolation Oversampling method with the purpose of Reliably Expanding the Minority class regions (OREM). OREM first finds the candidate minority region around each original minority sample, then exploits this region to further identify those clean subregions without distributing any majority sample. The synthetic samples are only allowed to generate in the clean subregions, so that the regions of the minority class can be broadened reliably. Given that the learning from multiclass imbalanced data is more challenging as compared to two-class scenarios, we also extend OREM to handle multiclass imbalance problems by leveraging an iteration procedure of generating synthetic samples, consequently leading to a multiclass oversampling algorithm OREM-M. The key peculiarity of OREM-M is to reduce the class overlapping not only between the synthetic minority and original samples, but also from the synthetic samples of different minority classes. In this way, OREM-M ensures that the data of each class after oversampling can be modeled well. In addition, we embed OREM into boosting framework to develop a new ensemble method OREMBoost addressing class imbalance problems. Extensive experiments demonstrate the effectiveness of the proposed OREM, OREM-M, and OREMBoost.
computer science, information systems, artificial intelligence,engineering, electrical & electronic