Abstract:Learning from imbalanced datasets is a nontrivial task for supervised learning community. Traditional classifiers may have difficulties to learn the concept related to the minority class when addressing imbalanced classification and the issues can become more deteriorated in the presence of other complicated aspects: overlapping, outliers and small disjuncts, etc. In this paper, we propose a self-adaptive oversampling algorithm based on the complexity of minority data for dealing with imbalanced datasets classification problems. In the proposed algorithm, various hyperspheres with different radii determined by imbalance ratio and the distances to the nearest enemy neighbors are firstly generated to cover all minority instances provided that they cannot contain any majority instance. Subsequently, the oversampling process is conducted only within these hyperspheres and thus the generated synthetic minority instances cannot intervene within the majority space, eventually avoiding overlapping issues during achieving between-class balance. In addition, a self-adaptive assignment strategy of oversampling sizes is developed based on the minority data complexity, where the hyperspheres with small radii and few instances in them are provided more chances to be oversampled. The strategy will favor addressing the outliers and small disjuncts issues since the hyperspheres covering the outliers and small disjuncts are usually of small sizes and contain few instances, which makes them have more chances to generate synthetic instances and thus eliminate within-class imbalance due to lack of density. Moreover, since the hyperspheres covering boundary minority instances are relatively small and thus are assigned with larger oversampling sizes, the proposed approach can also strengthen the boundary information of minority class, thus favoring the later learning tasks. The extensive experimental results on various simulated and real-world imbalanced datasets show that the proposed method significantly outperforms other state-of-the-art oversampling ones.

Oversampling Algorithm based on Reinforcement Learning in Imbalanced Problems

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Oversampling Algorithm Based on Spatial Distribution of Data Sets for Imbalance Learning

Trainable Undersampling for Class-Imbalance Learning.

Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization

Over-sampling Algorithm Based on Preliminary Classification in Imbalanced Data Sets Learning

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

Research on Ant Colony Optimization of Oversampling Problem Based on Bootstrap and Maximum Entropy Principle

Research on Oversampling Algorithm for Imbalanced Datasets Based on ARIMA Model

Imbalanced Data Classification Algorithm Based on Undersampling

Adaptive Sampling With Optimal Cost For Class-Imbalance Learning

A MeanShift-guided Oversampling with Self-Adaptive Sizes for Imbalanced Data Classification

Over-Sampling Algorithm Based on VAE in Imbalanced Classification

A New Sampling Approach for Classification of Imbalanced Data Sets with High Density.

Self-adaptive Oversampling Method Based on the Complexity of Minority Data in Imbalanced Datasets Classification

Exploratory Undersampling for Class-Imbalance Learning

A new over-sampling algorithm by gaussian mixture model

An Effective Sampling Strategy For Ensemble Learning With Imbalanced Data

Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping