Abstract:Existing feature selection methods easily neglect the distribution of data, and require most of the neighborhood radius in neighborhood rough sets (NRS) to be selected artificially. These limitations result in the misclassification of samples. To address these drawbacks, this paper presents a mixed measure-based feature selection method using the Fisher score and an NRS model. First, the variation coefficient of the features in different decision classes is defined to depict the dispersion degree of different features, based on which, the neighborhood class is described to develop a novel NRS model. The concepts of dependency degree, neighborhood knowledge granularity, and average neighborhood entropy are defined, and then a mixed measure combining the information and algebra views is proposed to measure the uncertainty in neighborhood decision systems. Second, the average correlation degree of the feature subset is computed to assess the redundancy of the reduced feature subset. By combining the classification accuracy of the selected features, the reduction rate of the classification result, and the average correlation degree of the reduced feature set, we can construct an adaptive neighborhood radius function to avoid the artificial selection of the optimal neighborhood radius. Then, an optimal feature subset can be obtained according to the internal and external significance of the features. Third, the variation coefficient of the samples in different decision classes in each feature is defined to compute the dispersion degree of the samples, and the average of all samples in each feature is added to the between-class scatter to eliminate the effect of the different measurement dimensions of the features; then, the Fisher score model is improved to eliminate the noise of the high-dimensional data. Finally, a heuristic feature selection algorithm with the Fisher score based on the new NRS model is designed to select an optimal feature subset. Experimental results applied to five low-dimensional UCI datasets and nine high-dimensional gene expression datasets showed that the developed algorithm is effective and can select an optimal reduced subset with high classification accuracy when compared with some of the latest algorithms.

A New Noisy Random Forest Based Method for Feature Selection

Nonparametric feature selection by random forests and deep neural networks

Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE

A review of random forest-based feature selection methods for data science education and applications

The All Relevant Feature Selection using Random Forest

Random Forest Variable Importance-based Selection Algorithm in Class Imbalance Problem

RANDOM FOREST AND SUPPORT VECTOR MACHINE ON FEATURES SELECTION FOR REGRESSION ANALYSIS

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Feature Selection Methods for Cost-Constrained Classification in Random Forests

Feature Selection With Local Density-Based Fuzzy Rough Set Model for Noisy Data

Feature Selection and Optimization of Random Forest Modeling

A Novel Feature Selection Method Based on Extreme Learning Machine and Fractional-Order Darwinian PSO.

Effect of hyperparameters on variable selection in random forests

Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features

Mixed Measure-Based Feature Selection Using the Fisher Score and Neighborhood Rough Sets

Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression

Evolution of the random subset feature selection algorithm for classification problem

A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model

Empirical Evaluation of the Performance of Feature Selection Approaches on Random Forest

Selection Features and Support Vector Machine for Credit Card Risk Identification