Abstract:In practical applications, imbalanced datasets significantly degrade the classification performance of machine learning models. However, most conventional resampling approaches fall short in adequately addressing the varying contributions of individual features to the classification model. In response to this defect, this study introduces three novel resampling approaches. The first approach, Oversampling based on class instance density per feature value intervals (OCF), focuses on augmenting the dataset. The second approach, Undersampling based on class instance density per feature value intervals (UCF), seeks to reduce dataset size. The third approach, Hybrid sampling based on class instance density per feature value intervals (HSCF), which can perform oversampling and undersampling simultaneously. These approaches categorize feature value into different intervals based on their varying information content, calculate class instance densities within these intervals, and generate feature values in intervals with high discriminative information. Subsequently, these generated features are combined to synthesize minority class data, effectively achieving oversampling. Additionally, the study combines class instance density and feature importance to identify majority class data at the classification boundary with minimal contribution and subsequently executes undersampling. The flexibility to adjust sampling ratios and the integration of OCF and UCF enable the implementation of hybrid sampling. Finally, experiments on the benchmark dataset demonstrate the superiority and effectiveness of the proposed method. Furthermore, it is observed that the method proposed in this study enhances the feature dividing capability of decision tree classifiers. Hence, the best results are achieved when working in synergy with decision tree classifiers, leading to the most significant improvements in classification performance. All codes have been published at https://github.com/Wangfeiopen/HS CF .

Unbalanced data processing using deep sparse learning technique

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning.

Sample Weighting: an Inherent Approach for Outlier Suppressing Discriminant Analysis

Hybrid SVM algorithm oriented to classifying imbalanced datasets

Learning algorithm with non-balanced data for computer-aided diagnosis of breast cancer

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE

Over-sampling algorithm for imbalanced data classification

Towards Deeper Insights into Deep Learning from Imbalanced Data.

SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique

A Density-based Under-sampling Algorithm for Imbalance Classification

Improved SVM algorithm for imbalanced dataset classification

Resampling approach for imbalanced data classification based on class instance density per feature value intervals

To Balance or Not to Balance: A Simple-yet-Effective Approach for Learning with Long-Tailed Distributions

An ensemble imbalanced classification method based on model dynamic selection driven by data partition hybrid sampling

Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets

Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization

Imbalanced Data Classification Based on Improved Random-SMOTE and Feature Standard Deviation

DDSC-SMOTE: an imbalanced data oversampling algorithm based on data distribution and spectral clustering

Hybrid approaches for handling imbalanced structured and unstructured data

Oversampling for Imbalanced Learning Based on K-Means and SMOTE