Abstract:With the rapid development of artificial intelligence in recent years, the research on image processing, text mining, and genome informatics has gradually deepened, and the mining of large-scale databases has begun to receive more and more attention. The objects of data mining have also become more complex, and the data dimensions of mining objects have become higher and higher. Compared with the ultra-high data dimensions, the number of samples available for analysis is too small, resulting in the production of high-dimensional small sample data. High-dimensional small sample data will bring serious dimensional disasters to the mining process. Through feature selection, redundancy and noise features in high-dimensional small sample data can be effectively eliminated, avoiding dimensional disasters and improving the actual efficiency of mining algorithms. However, the existing feature selection methods emphasize the classification or clustering performance of the feature selection results and ignore the stability of the feature selection results, which will lead to unstable feature selection results, and it is difficult to obtain real and understandable features. Based on the traditional feature selection method, this paper proposes an ensemble feature selection method, Random Bits Forest Recursive Clustering Eliminate (RBF-RCE) feature selection method, combined with multiple sets of basic classifiers to carry out parallel learning and screen out the best feature classification results, optimizes the classification performance of traditional feature selection methods, and can also improve the stability of feature selection. Then, this paper analyzes the reasons for the instability of feature selection and introduces a feature selection stability measurement method, the Intersection Measurement (IM), to evaluate whether the feature selection process is stable. The effectiveness of the proposed method is verified by experiments on several groups of high-dimensional small sample data sets.

Feature Selection with Partition Differentiation Entropy for Large-Scale Data Sets.

$$\Hbox {u}^2\hbox {f}^2\hbox {S}^2$$ U 2 F 2 S 2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection.

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy

Accelerating information entropy-based feature selection using rough set theory with classified nested equivalence classes

An Emerging Fuzzy Feature Selection Method Using Composite Entropy-Based Uncertainty Measure and Data Distribution

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology

Entropy based measure and its algorithms for scalable feature selection

Incremental neighborhood entropy-based feature selection for mixed-type data under the variation of feature set

Challenges of Feature Selection for Big Data Analytics

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Feature Selection Based on Data Clustering

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

Feature Selection for Unbalanced Distribution Hybrid Data Based on ${K}$-Nearest Neighborhood Rough Set

Feature Selection: A Data Perspective

An entropic feature selection method in perspective of Turing formula

Uncertainty Measure-Based Incremental Feature Selection For Hierarchical Classification

Selecting features by utilizing intuitionistic fuzzy Entropy method

Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems

An Information-Theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data