Abstract:Class imbalance problem (CIP) in a dataset is a major challenge that significantly affects the performance of Machine Learning (ML) models resulting in biased predictions. Numerous techniques have been proposed to address CIP, including, but not limited to, Oversampling, Undersampling, and cost-sensitive approaches. Due to its ability to generate synthetic data, oversampling techniques such as the Synthetic Minority Oversampling Technique (SMOTE) are the most widely used methodology by researchers. However, one of SMOTE's potential disadvantages is that newly created minor samples overlap with major samples. Therefore, the probability of ML models' biased performance toward major classes increases. Generative adversarial network (GAN) has recently garnered much attention due to their ability to create real samples. However, GAN is hard to train even though it has much potential. Considering these opportunities, this work proposes two novel techniques: GAN-based Oversampling (GBO) and Support Vector Machine-SMOTE-GAN (SSG) to overcome the limitations of the existing approaches. The preliminary results show that SSG and GBO performed better on the nine imbalanced benchmark datasets than several existing SMOTE-based approaches. Additionally, it can be observed that the proposed SSG and GBO methods can accurately classify the minor class with more than 90% accuracy when tested with 20%, 30%, and 40% of the test data. The study also revealed that the minor sample generated by SSG demonstrates Gaussian distributions, which is often difficult to achieve using original SMOTE and SVM-SMOTE.

ISMOTE: A More Accurate Alternative for SMOTE

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

PF-SMOTE: A Novel Parameter-Free SMOTE for Imbalanced Datasets

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE

SMOTE: Synthetic Minority Over-sampling Technique

Over-sampling algorithm for imbalanced data classification

SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Learning class-imbalanced data with region-impurity synthetic minority oversampling technique

Improving SVM Classification with Imbalance Data Set

SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling

Augmenting the diversity of imbalanced datasets via multi-vector stochastic exploration oversampling

ESMOTE: an Overproduce-and-choose Synthetic Examples Generation Strategy Based on Evolutionary Computation

Minimum Enclosing Ball Synthetic Minority Oversampling Technique from a Geometric Perspective

An Over Sampling Method of Unbalanced Data Based on Ant Colony Clustering

SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors

Enhancing and improving the performance of imbalanced class data using novel GBO and SSG: A comparative analysis

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

SMOTified-GAN for class imbalanced pattern classification problems

CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification