Local Distribution-Based Adaptive Minority Oversampling for Imbalanced Data Classification

Xinyue Wang,Jian Xu,Tieyong Zeng,Liping Jing
DOI: https://doi.org/10.1016/j.neucom.2020.05.030
IF: 6
2021-01-01
Neurocomputing
Abstract:Imbalanced data classification, as a challenging task, has drawn a significant interest in numerous scientific areas. One popular strategy to balance the instance quantities between two classes is oversampling via generating synthetic instances. However, it still suffers from two key issues: where and how many synthetic instances should be generated. In this paper, we propose a Local distribution-based Adaptive Minority Oversampling method (LAMO) to deal with the imbalance classification problem. LAMO first identifies the informative borderline minority instances as sampling seeds according to their neighbors and the corresponding class distribution. Then, LAMO captures the local distribution of each seed according to its Euclidean distances from the nearest majority instance and nearest minority instance.Finally, LAMO generates synthetic instances around seeds via a Gaussian Mixture Model (GMM). For each component of GMM, the mixing coefficient and bandwidth are adaptively set with the aid of seeds’ local distribution. Extensive experiments have been conducted on both simulated and real data sets under varying the imbalance ratio and data size. By comparing with the state-of-the-art oversampling methods, the proposed LAMO obtains promising results in terms of several widely used evaluation metrics.
What problem does this paper attempt to address?