Abstract:Class imbalance and overlap coupling is the primary cause of performance degradation for classifiers. Unfortunately, it always occurs. In this paper, we propose a hybrid sampling method derived from optimized generative adversarial network and natural neighbor search (HD-GNNS) for this scenario. The new approach considers both the global distribution and local distribution of the minority class to improve the data distribution fundamentally. First, natural neighbor search with Fisher's discriminant ratio is conducted to screen overlapped sample subset and remove noise samples. It effectively overcomes the parameter sensitivity by adaptively determining the search radius. Then, an encoder with squeeze and excite block is introduced into generative adversarial network, and the structure of generative adversarial network is optimized with cross-layer and low-rank matrix. It better captures the distribution characteristics of minority samples in overlapped subset for oversampling. Afterwards, the local density of majority samples in overlapped subset is calculated by the aforementioned natural neighbor search method, and Thornton's Separation Index is used to implement under-sampling adaptively. We evaluate the proposed approach on 1 artificial dataset, 14 UCI datasets and 8 real-word datasets. The experimental results show that the proposed HD-GNNS exhibits more impressive performance compared to other benchmark methods.

Model-Based Oversampling for Imbalanced Sequence Classification

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification

A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data

Noise-robust Oversampling for Imbalanced Data Classification

Minority Oversampling for Imbalanced Time Series Classification

A hybrid sampling method for highly imbalanced and overlapped data classification with complex distribution

VOS: a Method for Variational Oversampling of Imbalanced Data

A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization

Synthetic Oversampling: Theory and A Practical Approach Using LLMs to Address Data Imbalance

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning

Oversampling for Imbalanced Learning Based on K-Means and SMOTE

Similar classes latent distribution modelling-based oversampling method for imbalanced image classification

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Multiple adaptive over-sampling for imbalanced data evidential classification

Boundary-Focused Generative Adversarial Networks for Imbalanced and Multimodal Time Series

Binary imbalanced data classification based on diversity oversampling by generative models

Oversampling With Reliably Expanding Minority Class Regions for Imbalanced Data Learning