Abstract:Most data in real life are characterized by imbalance problems. One of the classic models for dealing with imbalanced data is neural networks. However, the data imbalance problem often causes the neural network to display negative class preference behavior. Using an undersampling strategy to reconstruct a balanced dataset is one of the methods to alleviate the data imbalance problem. However, most existing undersampling methods focus more on the data or aim to preserve the overall structural characteristics of the negative class through potential energy estimation, while the problems of gradient inundation and insufficient empirical representation of positive samples have not been well considered. Therefore, a new paradigm for solving the data imbalance problem is proposed. Specifically, to solve the problem of gradient inundation, an informative undersampling strategy is derived from the performance degradation and used to restore the ability of neural networks to work under imbalanced data. In addition, to alleviate the problem of insufficient empirical representation of positive samples, a boundary expansion strategy with linear interpolation and the prediction consistency constraint is considered. We tested the proposed paradigm on 34 imbalanced datasets with imbalance ratios ranging from 16.90 to 100.14. The test results show that our paradigm obtained the best area under the receiver operating characteristic curve (AUC) on 26 datasets.

An Imbalanced Data Classification Algorithm Based on Boosting

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning.

IMBALANCED DATA CLASSIFICATION ACTIVE LEARNING ALGORITHM BASED ON BOOSTING

Unbalanced Data Classification Based on Oversampling and Integrated Learning

Majority-to-minority Resampling for Boosting-Based Classification under Imbalanced Data

An Improved AdaBoost Algorithm for Unbalanced Classification Data.

A New Sampling Approach for Classification of Imbalanced Data Sets with High Density.

RBSP-Boosting: A Shapley Value-Based Resampling Approach for Imbalanced Data Classification

The improved AdaBoost algorithms for imbalanced data classification

A SVM Classifier for Imbalanced Datasets Based on SMOTEBoost

A weighted hybrid ensemble method for classifying imbalanced data

Novel modified AdaBoost algorithm for imbalanced data classification

An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

Algorithm of Partition Based Network Boosting for Imbalanced Data Classification

Dbboost: Enhancing Imbalanced Classification by A Novel Ensemble Based Technique

Boosting Prediction Performance on Imbalanced Dataset

Multi-Class Imbalance Classification Based on Data Distribution and Adaptive Weights

BalancedBoost: A hybrid approach for real-time network traffic classification

Neural Network with a Preference Sampling Paradigm for Imbalanced Data Classification.