Abstract:Traditional feature selection algorithms assume that the sample and feature space is known before learning, while most of the data is feature streams or data streams in reality. Currently, streaming feature selection algorithms can retain relevant features by removing redundant and irrelevant features based on the interaction between features, but they ignore the specific number of features that have interaction. Most of the existing studies only consider the case of interaction between two features, which is not quite in line with most realistic scenarios, i.e., the number of features with interaction is unknown. This paper concentrates on the high-order interactions between stream feature, and proposes a Combinatorial High-Order Interactive Feature Selection based on Dynamic GCN and Sparse learning (CHOIFS-DGS). Based on previous definitions of feature interaction, this paper proposes some new metrics to measure the degree of interaction between a newly arrived feature and an already selected feature. CHOIFS-DGS consists of three main parts, namely: low-order online feature selection based on interaction measure, high-order online feature selection based on dynamic GCN, and Intra-group sparse feature selection. In the experimental analysis section, this study employs two different classifiers and eleven publicly released data sets, including gene data related to diseases and data from two classification challenges (NeurIPS 2003 feature selection challenge and WCCI 2006 Performance Prediction Challenge). The experimental results demonstrate that the proposed CHOIFS-DGS model significantly improves classification accuracy on all eleven data sets, while using a relatively smaller number of features, thus fulfilling the role of key feature selection. Furthermore, the CHOIFS-DGS algorithm consists of three components: LO-OIFS, HO-OIFS, and Group-Sparse. By applying these three sub-modules separately for extracting data features and comparing the results with CHOIFS-DGS, it is found that the performance of CHOIFS-DGS is lower than that of the individual sub-modules only in three data sets, while significantly better in the remaining eight data sets. This indicates that the integrated use of the three sub-modules can enhance model accuracy. Finally, in the ablation experiment, to verify the necessity of considering higher-order interactions among features, the results of the HO-OIFS module were compared with those of the other modules. The results show that the model's accuracy significantly improves after incorporating the HO-OIFS module, thereby demonstrating that considering higher-order interactions between features is essential.

BSSReduce an $O(left|U ight|)$ Incremental Feature Selection Approach for Large-Scale and High-Dimensional Data

$$\Hbox {u}^2\hbox {f}^2\hbox {S}^2$$ U 2 F 2 S 2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection.

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

A Contrast Based Feature Selection Algorithm for High-dimensional Data set in Machine Learning

Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data.

Invariant optimal feature selection: A distance discriminant and feature ranking based solution

Online Feature Selection for Mining Big Data

Challenges of Feature Selection for Big Data Analytics

Parallel Selector for Feature Reduction

Incremental feature selection approach to multi-dimensional variation based on matrix dominance conditional entropy for ordered data set

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Large-scale Multi-objective Feature Selection: A Multi-phase Search Space Shrinking Approach

Consistent Matrix: A Feature Selection Framework for Large-Scale Data Sets

Incremental feature selection based on fuzzy rough sets

A fusion of centrality and correlation for feature selection

Anti-hapten antibody induction with deoxyribonucleic acid carrier.

Analysis and comparison of feature selection methods towards performance and stability

Feature Selection: A Data Perspective

Large-Scale Meta-Heuristic Feature Selection Based on BPSO Assisted Rough Hypercuboid Approach

Combinatorial Online High-Order Interactive Feature Selection Based on Dynamic Graph Convolution Network

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology