Abstract:Feature selection is an important process in machine learning and knowledge discovery. By selecting the most informative features and eliminating irrelevant ones, the performance of learning algorithms can be improved and the extraction of meaningful patterns and insights from data can be facilitated. However, most existing feature selection methods, when applied to large datasets, encountered the bottleneck of high computation costs. To address this problem, we propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes. We introduce a dimensionless quantity as a surrogate representation to summarize the distributional individuality of certain classes, based on this quantity we evaluate features and study the correlation among them. We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation in comparison with other state-of-the-art feature selection methods.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the feature selection problem in high - dimensional datasets. Specifically, most existing feature selection methods encounter the bottleneck of high computational cost when applied to large - scale datasets. To solve this problem, the author proposes a new filter - based feature selection method - ContrastFS, which selects discriminative features based on the differences between different classes. By introducing a dimensionless quantity as a proxy representation to summarize the distribution characteristics of a specific class, and evaluating features and their correlations based on this quantity. The paper verifies the effectiveness and efficiency of this method on several widely - studied benchmark datasets, and the results show that the new method outperforms other state - of - the - art feature selection methods with almost negligible computational cost.
### Core contributions of the paper
1. **Proposing a new feature selection method**: Called ContrastFS, this method constructs proxy representations to capture the statistical characteristics of each class and evaluates the importance of features by quantifying their differences among different classes.
2. **Experimental verification**: Experiments were carried out on multiple real - world datasets, showing that this method has a fast calculation speed, clear significance, and provides a good balance between classification accuracy and running time.
3. **Application of proxy representation**: Demonstrates how to use these proxy representations to study the correlations between features, thereby improving performance while maintaining efficiency.
4. **Stability and performance enhancement**: The stability and performance of the method are enhanced through the bootstrap method.
### Method overview
1. **Problem definition**:
- The goal is to find a subset \(T^*\) from the original feature set \(S = \{f_1,\ldots,f_d\}\) such that its utility \(U\) is maximized, and the size of the subset is \(m\).
- The mathematical expression is:
\[
T^*=\arg\max_{T\subseteq S}U(T),\quad\text{s.t.}\ |T| = m,\ m < d
\]
- In practical applications, since the exact probability distribution \(p(x)\) is unknown, the feature selection problem needs to be transformed into an empirical form:
\[
T^*=\arg\max_{T\subseteq S}F(X_T),\quad\text{s.t.}\ |T| = m,\ m < d
\]
where \(F(X_T)=\hat{U}(T)\) is the utility estimate of the feature subset \(T\).
2. **Solution**:
- **Constructing proxy representation**: Through standardization and low - order sample moment calculations, construct the proxy representation \(Z_k\) for each class:
\[
Z_k^t = C_v^k\frac{\mu_k^t-\mu^t}{\sigma_k^t-\bar{\sigma}^t},\quad i\in\{1,\ldots,d\},\ k\in\{1,\ldots,C\}
\]
where \(\bar{\sigma}^t\) is the mean of \(\sigma_k^t\), \(C_v^k\) is the coefficient of variation and can be set according to specific situations.
- **Evaluating features**: Calculate the average difference of each feature between different classes as the importance score \(I(f_t)\) of the feature:
\[
I(f_t)=\frac{1}{C(C - 1)}\sum_i\sum_{j\neq i}\left|C_v^k\frac{\mu_i^t-\mu^t}{\sigma_i^t-\bar{\sigma}^t}-C_v^k\frac{\mu_j^t-\mu^t}{\sigma_j^t-\bar{\sigma}^t}\right|