Abstract:With the rapid development of artificial intelligence in recent years, the research on image processing, text mining, and genome informatics has gradually deepened, and the mining of large-scale databases has begun to receive more and more attention. The objects of data mining have also become more complex, and the data dimensions of mining objects have become higher and higher. Compared with the ultra-high data dimensions, the number of samples available for analysis is too small, resulting in the production of high-dimensional small sample data. High-dimensional small sample data will bring serious dimensional disasters to the mining process. Through feature selection, redundancy and noise features in high-dimensional small sample data can be effectively eliminated, avoiding dimensional disasters and improving the actual efficiency of mining algorithms. However, the existing feature selection methods emphasize the classification or clustering performance of the feature selection results and ignore the stability of the feature selection results, which will lead to unstable feature selection results, and it is difficult to obtain real and understandable features. Based on the traditional feature selection method, this paper proposes an ensemble feature selection method, Random Bits Forest Recursive Clustering Eliminate (RBF-RCE) feature selection method, combined with multiple sets of basic classifiers to carry out parallel learning and screen out the best feature classification results, optimizes the classification performance of traditional feature selection methods, and can also improve the stability of feature selection. Then, this paper analyzes the reasons for the instability of feature selection and introduces a feature selection stability measurement method, the Intersection Measurement (IM), to evaluate whether the feature selection process is stable. The effectiveness of the proposed method is verified by experiments on several groups of high-dimensional small sample data sets.

A Novel Feature Subspace Selection Method in Random Forests for High Dimensional Data.

A Forest of Trees with Principal Direction Specified Oblique Split on Random Subspace.

Efficient random subspace decision forests with a simple probability dimensionality setting scheme

The random subspace method for constructing decision forests

Asymptotic Properties of High-Dimensional Random Forests

Nonparametric feature selection by random forests and deep neural networks

Research of Medical High-Dimensional Imbalanced Data Classification Ensemble Feature Selection Algorithm with Random Forest

Supervised Discriminative Sparse PCA with Adaptive Neighbors for Dimensionality Reduction

Importance Feature Sampling in Random Subspace

KNCFS: Feature selection for high-dimensional datasets based on improved random multi-subspace learning

Research on Optimization of Random Forest Algorithm Based on Spark

Neural Forest Learning.

Dimension Reduction Forests: Local Variable Importance using Structured Random Forests

New forest-based approaches for sufficient dimension reduction

FACT: High-Dimensional Random Forests Inference

Principal Component Analysis Based Feature Selection for Clustering

Random Subsequence Forests

Feature Selection Methods for Cost-Constrained Classification in Random Forests

Fuzzy Forests For Feature Selection in High-Dimensional Survey Data: An Application to the 2020 U.S. Presidential Election

A Modified Random Survival Forests Algorithm for High Dimensional Predictors and Self-Reported Outcomes

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology