Abstract:Minority oversampling is currently one of the most popular and effective methods for handling imbalanced data. However, oversampling that relies on the observations of the minority class to generate new samples is not applicable in the scenario of imbalanced data with extremely scarce minority samples, because the strongly underrepresented minority class does not contain enough information to support the oversampling process. Since some recent studies have exhibited the effectiveness of using majority information to bootstrap oversampling, the neglect of class overlap in the sampling process would increase the overlapping degree and complicate the decision boundary. To this end, this paper proposes a Mahalanobis distance and Local information based OverSampling (MLOS) for highly imbalanced class-overlapped data. MLOS first employs the majority density to guide the sample synthesis, with Mahalanobis distance to extract the majority probability contour. Then for each minority seed sample, to avoid the generation of overlapping samples, MLOS constrain the synthetic process by finding the auxiliary sample (in its 5 nearest neighbors) with similar probability density value to the seed. Finally, MLOS uses a pair-wise data cleaning process to improve the visibility of the decision boundary according to the probability density of synthetic samples. Comparative experiments conducted on 16 highly imbalanced class-overlapped datasets, using 17 different methods, demonstrates the superiority of our proposed method in terms of three popular evaluation metrics AUC , G - mean and Recall for imbalance classification. The source code of MLOS is available at https://github.com/ytyancp/MLOS .

Transfer Synthetic Over-Sampling for Class-Imbalance Learning with Limited Minority Class Data

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

Oversampling With Reliably Expanding Minority Class Regions for Imbalanced Data Learning

A Synthetic Minority Oversampling Method Based on Local Densities in Low-Dimensional Space for Imbalanced Learning.

Synthetic oversampling with Mahalanobis distance and local information for highly imbalanced class-overlapped data

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Trainable Undersampling for Class-Imbalance Learning.

Over-sampling Algorithm Based on Preliminary Classification in Imbalanced Data Sets Learning

A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

Exploratory Undersampling for Class-Imbalance Learning

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

Synthetic Over-sampling with the Minority and Majority Classes for Imbalance Problems

Transfer and share: semi-supervised learning from long-tailed data

Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced Data

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE

Adaptive Sampling With Optimal Cost For Class-Imbalance Learning

Over-sampling algorithm for imbalanced data classification

A majority affiliation based under-sampling method for class imbalance problem

Noise-free sampling with majority framework for an imbalanced classification problem