An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy

Wei Chen,Wenjie Guo,Weijie Mao
DOI: https://doi.org/10.1007/s10489-024-05754-x
IF: 5.3
2024-09-20
Applied Intelligence
Abstract:Imbalanced data classification problem is a prevalent concern within the realms of machine learning and data mining. However, conventional methods primarily concentrate on between-class imbalance, ignoring noisy, overlap and within-class issues. To address these issues, a new adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy (ASCFNO) was proposed in this study. First, this method develops a new DPINF (Density Peak Clustering with Improved Noise Filter) clustering algorithm not only to identify minority class sub-clusters with various sizes and densities but also simultaneously filter noisy instances, which can deal with noisy problem and be more beneficial for the subsequent steps to solve between-class and within-class imbalance problems. Second, an adaptive strategy determines the over-sampling size for each minority class sub-cluster, which assigns various weights to each sub-cluster by considering different factors to settle the issues of within-class imbalance. In the end, novel synthetic minority instances are generated between two instances located in the same sub-cluster that are selected according to their probability distribution, which prevents the generation of any noisy or overlapped synthetic instances by the traditional SMOTE method. The performance of the proposed ASCFNO was assessed on 32 benchmark imbalanced datasets. The experiment results prove the effectiveness and feasibility of the above improvements.
computer science, artificial intelligence
What problem does this paper attempt to address?