Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap

Peiqi Sun,Yanhui Du,Siyun Xiong
DOI: https://doi.org/10.1016/j.neucom.2024.128492
IF: 6
2024-09-02
Neurocomputing
Abstract:While addressing the problem of imbalanced data classification, most existing resampling methods primarily focus on balancing class distribution. However, they often overlook class overlap and fail to adequately consider the feature distributions of different classes. Consequently, when resampling is performed under such conditions, samples within areas of overlap remain susceptible to misclassification, failing to substantially improve overall performance. To address these shortcomings, we propose a novel data resampling technique, Nearest Neighbors and Density-based Undersampling (NDU). This method employs within-class k-nearest neighbors and between-class probability densities to design a weight assignment strategy. Leveraging this strategy, we establish an exclusive metric, the F_factor, to evaluate the importance of majority class samples in overlap areas. Subsequently, NDU promotes a gradient-based segmented undersampling strategy, which applies varying degrees of undersampling to majority class samples across segmented regions. Through experiments on binary imbalanced datasets with class overlap, we evaluate the efficiency of diverse resampling methods concerning classification performance. The results demonstrate that our proposed method effectively addresses class overlap challenges.
computer science, artificial intelligence
What problem does this paper attempt to address?