Abstract:Learning from imbalanced datasets poses a major challenge in data mining community. When dealing with imbalanced datasets, conventional classification algorithms generally perform poorly as they are originally designed to work under balanced class distribution scenarios. Although there exist different methods to addressing this issue, sampling methods especially over-sampling techniques have shown great potentials as they aim to improve datasets itself rather than the classifiers, which can allow them to be used for any classifier. In this paper, we propose a novel adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering. Unlike other clustering-based over-sampling methods, the proposed approach applies modified density peaks clustering rather than traditional k-means clustering techniques to cluster the minority instances due to its capability of accurately identifying sub-clusters with different sizes and densities, which is beneficial for the proposed method to simultaneously accommodate for between-class and within-class imbalance issues caused by various reasons. Subsequently, the size for each identified sub-cluster to be oversampled is adaptively determined according to its own size and density and then the minority instances within each sub-cluster are oversampled based on their probabilities inversely proportional to their distances to the majority class and their densities with the aim of generating more synthetic minority instances for borderline and sparser ones. Finally, in order to avoid the generation of overlapping, a heuristic filtering strategy is also developed to iteratively move the possibly overlapped minority instances away from the majority class. The extensive experimental results on the different imbalanced datasets demonstrate that the proposed approach can achieve better classification performance in most datasets as compared to the other existing over-sampling techniques. (C) 2020 Elsevier Inc. All rights reserved.

Adaptive Weighted Over-Sampling for Imbalanced Datasets Based on Density Peaks Clustering with Heuristic Filtering

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning.

Over-sampling algorithm for imbalanced data classification

Global Data Distribution Weighted Synthetic Oversampling Technique for Imbalanced Learning

Spectral clustering based oversampling:oversampling taking within class ;imbalance into consideration

Resampling approach for imbalanced data classification based on class instance density per feature value intervals

A New Sampling Approach for Classification of Imbalanced Data Sets with High Density.

A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification

Sample Weighting: an Inherent Approach for Outlier Suppressing Discriminant Analysis

Adaptively weighted three-way decision oversampling: A cluster imbalanced-ratio based approach

A Density-based Under-sampling Algorithm for Imbalance Classification

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

An Improving Majority Weighted Minority Oversampling Technique for Imbalanced Classification Problem

Adaptive Sampling With Optimal Cost For Class-Imbalance Learning

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

A Diversity-Based Synthetic Oversampling Using Clustering for Handling Extreme Imbalance

Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification

Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets