Abstract:For imbalanced data, classification efficiency degrades significantly due to the missing information for the positive class, and existing sampling schemes do not consider the distributions of samples. Additionally, the global parameters of fuzzy neighborhoods are set manually. These defects affect the effectiveness of classifier. To address these problems, we offer an adaptive fuzzy multi-neighborhood feature selection methodology with intercluster distance-based hybrid sampling for class-imbalanced data. First, the number of clusters can be defined in terms of the number of samples in the negative or positive class. The initial centers of the clusters are determined according to the number of clusters, and the dissimilarity and similarity measures are calculated by using the intercluster distances between samples. Then, the cluster center, fuzzy membership matrix, and intercluster distance are studied, and then the optimization objective function is designed. The hybrid sampling scheme can be used to combine the generated positive class samples and negative class samples and obtain a class-balanced system. Second, according to the sample distribution, the standard deviation and a set of adaptive fuzzy multi-neighborhood radii are designed. A fuzzy multi-neighborhood similarity relation is defined by introducing a Gaussian kernel model to obtain a fuzzy multi-neighborhood granule, and an improved fuzzy multi-neighborhood rough set model is provided. Uncertain measures of fuzzy neighborhood systems are evaluated by the positive region and dependency. Third, by integrating fuzzy dependence with fuzzy complementary condition entropy, fuzzy multi-neighborhood complementary mutual information is provided on two viewpoints of algebra and information. Finally, a heuristic feature subset selection methodology for imbalanced classification with hybrid sampling using fuzzy c-means clustering is studied to obtain this excellent set of features. Experiments on 26 imbalanced datasets show the effectiveness of our designed algorithm.

A cluster impurity-based hybrid resampling for imbalanced classification problems

Imbalanced Data Sets Classification Method Based on Over-Sampling Technique

A Novel Svm Modeling Approach For Highly Imbalanced And Overlapping Classification

Imbalanced Data Classification Algorithm Based on Integrated Sampling and Ensemble Learning.

Hybrid SVM algorithm oriented to classifying imbalanced datasets

Resampling approach for imbalanced data classification based on class instance density per feature value intervals

A Classfication Method For Imbalance Data Set Based on Kernel SMOTE

Noise-robust Oversampling for Imbalanced Data Classification

A hybrid ensemble and evolutionary algorithm for imbalanced classification and its application on bioinformatics

Anomaly detection-based undersampling for imbalanced classification problems

A hybrid sampling method for highly imbalanced and overlapped data classification with complex distribution

Adaptive Fuzzy Multi-Neighborhood Feature Selection with Hybrid Sampling and Its Application for Class-Imbalanced Data

A Density-based Under-sampling Algorithm for Imbalance Classification

An empirical evaluation of sampling methods for the classification of imbalanced data

An Empirical Study on the Joint Impact of Feature Selection and Data Re-sampling on Imbalance Classification

Adaptive Sampling With Optimal Cost For Class-Imbalance Learning

An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy

Selecting the suitable resampling strategy for imbalanced data classification regarding dataset properties

Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models

Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction

Novel resampling algorithms with maximal cliques for class-imbalance problems