Abstract:Feature selection is an important process in machine learning and knowledge discovery. By selecting the most informative features and eliminating irrelevant ones, the performance of learning algorithms can be improved and the extraction of meaningful patterns and insights from data can be facilitated. However, most existing feature selection methods, when applied to large datasets, encountered the bottleneck of high computation costs. To address this problem, we propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes. We introduce a dimensionless quantity as a surrogate representation to summarize the distributional individuality of certain classes, based on this quantity we evaluate features and study the correlation among them. We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation in comparison with other state-of-the-art feature selection methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the feature selection problem in high - dimensional datasets. Specifically, most existing feature selection methods encounter the bottleneck of high computational cost when applied to large - scale datasets. To solve this problem, the author proposes a new filter - based feature selection method - ContrastFS, which selects discriminative features based on the differences between different classes. By introducing a dimensionless quantity as a proxy representation to summarize the distribution characteristics of a specific class, and evaluating features and their correlations based on this quantity. The paper verifies the effectiveness and efficiency of this method on several widely - studied benchmark datasets, and the results show that the new method outperforms other state - of - the - art feature selection methods with almost negligible computational cost. ### Core contributions of the paper 1. **Proposing a new feature selection method**: Called ContrastFS, this method constructs proxy representations to capture the statistical characteristics of each class and evaluates the importance of features by quantifying their differences among different classes. 2. **Experimental verification**: Experiments were carried out on multiple real - world datasets, showing that this method has a fast calculation speed, clear significance, and provides a good balance between classification accuracy and running time. 3. **Application of proxy representation**: Demonstrates how to use these proxy representations to study the correlations between features, thereby improving performance while maintaining efficiency. 4. **Stability and performance enhancement**: The stability and performance of the method are enhanced through the bootstrap method. ### Method overview 1. **Problem definition**: - The goal is to find a subset \(T^*\) from the original feature set \(S = \{f_1,\ldots,f_d\}\) such that its utility \(U\) is maximized, and the size of the subset is \(m\). - The mathematical expression is: \[ T^*=\arg\max_{T\subseteq S}U(T),\quad\text{s.t.}\ |T| = m,\ m < d \] - In practical applications, since the exact probability distribution \(p(x)\) is unknown, the feature selection problem needs to be transformed into an empirical form: \[ T^*=\arg\max_{T\subseteq S}F(X_T),\quad\text{s.t.}\ |T| = m,\ m < d \] where \(F(X_T)=\hat{U}(T)\) is the utility estimate of the feature subset \(T\). 2. **Solution**: - **Constructing proxy representation**: Through standardization and low - order sample moment calculations, construct the proxy representation \(Z_k\) for each class: \[ Z_k^t = C_v^k\frac{\mu_k^t-\mu^t}{\sigma_k^t-\bar{\sigma}^t},\quad i\in\{1,\ldots,d\},\ k\in\{1,\ldots,C\} \] where \(\bar{\sigma}^t\) is the mean of \(\sigma_k^t\), \(C_v^k\) is the coefficient of variation and can be set according to specific situations. - **Evaluating features**: Calculate the average difference of each feature between different classes as the importance score \(I(f_t)\) of the feature: \[ I(f_t)=\frac{1}{C(C - 1)}\sum_i\sum_{j\neq i}\left|C_v^k\frac{\mu_i^t-\mu^t}{\sigma_i^t-\bar{\sigma}^t}-C_v^k\frac{\mu_j^t-\mu^t}{\sigma_j^t-\bar{\sigma}^t}\right|

A Contrast Based Feature Selection Algorithm for High-dimensional Data set in Machine Learning

$$\Hbox {u}^2\hbox {f}^2\hbox {S}^2$$ U 2 F 2 S 2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection.

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

Feature Selection in the Contrastive Analysis Setting

Invariant optimal feature selection: A distance discriminant and feature ranking based solution

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Feature Selection Approach Based on Improved Fuzzy C-Means with Principle of Refined Justifiable Granularity

FS_SFS: A Novel Feature Selection Method for Support Vector Machines

A fusion of centrality and correlation for feature selection

Large-Scale Online Feature Selection for Ultra-High Dimensional Sparse Data.

A New Supervised Feature Selection Method for Pattern Classification.

Feature Selection: A Data Perspective

A novel hybrid feature selection method based on dynamic feature importance

A New Unsupervised Feature Selection Algorithm Using Similarity-Based Feature Clustering.

Clustering-based feature subset selection with analysis on the redundancy–complementarity dimension

Feature Selection Based on Data Clustering

A Feature Selection Framework Based on Supervised Data Clustering

An Optimal Feature Subset Selection Method Based On Distance Discriminant And Distribution Overlapping

Unsupervised Discriminative Feature Selection via Contrastive Graph Learning

Feature Selection Using Hierarchical Feature Clustering

Feature Selection for High Dimensional Imbalanced Class Data Based on F-measure Optimization