Abstract:In practical applications, imbalanced datasets significantly degrade the classification performance of machine learning models. However, most conventional resampling approaches fall short in adequately addressing the varying contributions of individual features to the classification model. In response to this defect, this study introduces three novel resampling approaches. The first approach, Oversampling based on class instance density per feature value intervals (OCF), focuses on augmenting the dataset. The second approach, Undersampling based on class instance density per feature value intervals (UCF), seeks to reduce dataset size. The third approach, Hybrid sampling based on class instance density per feature value intervals (HSCF), which can perform oversampling and undersampling simultaneously. These approaches categorize feature value into different intervals based on their varying information content, calculate class instance densities within these intervals, and generate feature values in intervals with high discriminative information. Subsequently, these generated features are combined to synthesize minority class data, effectively achieving oversampling. Additionally, the study combines class instance density and feature importance to identify majority class data at the classification boundary with minimal contribution and subsequently executes undersampling. The flexibility to adjust sampling ratios and the integration of OCF and UCF enable the implementation of hybrid sampling. Finally, experiments on the benchmark dataset demonstrate the superiority and effectiveness of the proposed method. Furthermore, it is observed that the method proposed in this study enhances the feature dividing capability of decision tree classifiers. Hence, the best results are achieved when working in synergy with decision tree classifiers, leading to the most significant improvements in classification performance. All codes have been published at https://github.com/Wangfeiopen/HS CF .

An approach to class imbalance problem based on stacking and inverse random under sampling methods

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning

A New Sampling Approach for Classification of Imbalanced Data Sets with High Density.

A Novel SMOTE-Based Classification Approach to Online Data Imbalance Problem

Adaptive Sampling With Optimal Cost For Class-Imbalance Learning

Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems.

Exploratory Undersampling for Class-Imbalance Learning

A Density-based Under-sampling Algorithm for Imbalance Classification

IMCStacking: Cost-sensitive Stacking Learning with Feature Inverse Mapping for Imbalanced Problems.

Imbalanced Data Classification Algorithm Based on Undersampling

Imbalanced Data Classification Method Based on Ensemble Learning

Trainable Undersampling for Class-Imbalance Learning.

Under-sampling class imbalanced datasets by combining clustering analysis and instance selection

A majority affiliation based under-sampling method for class imbalance problem

Irusrt: A Novel Imbalanced Learning Technique by Combining Inverse Random under Sampling and Random Tree

Resampling approach for imbalanced data classification based on class instance density per feature value intervals

Novel resampling algorithms with maximal cliques for class-imbalance problems

Undersampling Near Decision Boundary for Imbalance Problems

A Fuzzy Consensus Clustering Based Undersampling Approach for Class Imbalanced Learning

Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method