Abstract:Feature selection aims to identify the most pattern-discriminative feature subset. In prior literature, filter (e.g., backward elimination) and embedded (e.g., Lasso) methods have hyperparameters (e.g., top-K, score thresholding) and tie to specific models, thus, hard to generalize; wrapper methods search a feature subset in a huge discrete space and is computationally costly. To transform the way of feature selection, we regard a selected feature subset as a selection decision token sequence and reformulate feature selection as a deep sequential generative learning task that distills feature knowledge and generates decision sequences. Our method includes three steps: (1) We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses. Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores. (2) We leverage the trained feature subset utility evaluator as a gradient provider to guide the identification of the optimal feature subset embedding;(3) We decode the optimal feature subset embedding to autoregressively generate the best feature selection decision sequence with autostop. Extensive experimental results show this generative perspective is effective and generic, without large discrete search space and expert-specific hyperparameters.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing feature selection methods. Specifically: 1. **Filter Methods**: These methods usually rank features based on a certain scoring criterion (for example, the correlation between features and labels) and select the top \( k \) features as the optimal feature subset. However, they often overlook the relationships between features, are sensitive to data distribution, and are non - learning, so they perform poorly on complex datasets. 2. **Embedded Methods**: These methods select features by jointly optimizing feature selection and downstream prediction tasks. For example, LASSO shrinks feature coefficients by optimizing regression and regularization losses. But they rely on strong structural assumptions (such as sparse coefficients) and specific downstream models (such as regression), which makes them not flexible enough. 3. **Wrapper Methods**: These methods regard feature selection as a search problem in the large - scale discrete feature combination space, and usually use evolutionary algorithms or genetic algorithms to collaborate with downstream machine - learning models. However, as the number of features increases, the search space grows exponentially and the computational cost is very high. To overcome these problems, the authors propose a new deep - sequence - generation - learning framework, which regards feature selection as a continuous optimization problem rather than the traditional discrete - selection process. Specifically, they propose the following innovations: - **Generative Perspective**: Regard feature selection as a deep - sequence - generation AI task, that is, convert the selection of feature subsets into a continuous optimization problem. - **EOG (Embedding - Optimization - Generation) Framework**: It includes three steps: 1. **Embedding**: Develop a variational transformer model to learn the feature - subset embedding space by jointly optimizing the sequence - reconstruction loss, the feature - subset - accuracy - evaluation loss, and the variational - distribution - alignment loss (i.e., Kullback - Leibler loss). 2. **Optimization**: Use the trained feature - subset - utility - evaluator to generate gradient information to guide the identification of the optimal - feature - subset - embedding. 3. **Generation**: Decode the optimal - embedding vector and generate the best - feature - selection - decision sequence autoregressively. Through this method, the authors hope to effectively select the optimal feature subset without large - scale discrete search and improve the effectiveness and generalization ability of feature selection. ### Formula Summary - The objective of feature selection can be formalized as: \[ t^*=\psi(E^*) = \arg\max_{E\in E}M(X[\psi(E)], y) \] where: - \(\psi\) is the decoder, which is used to generate a feature - token sequence from any embedding \(E\); - \(E^*\) is the optimal - feature - subset - embedding; - \(M\) is the downstream machine - learning task; - \(X[]\) represents using a mapping table to convert the feature - token sequence into a feature subset. This new method not only improves the effect of feature selection but also enhances its generalization ability in various data domains.

Feature Selection as Deep Sequential Generative Learning

Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection

Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

Automated Feature Selection: A Reinforcement Learning Perspective

A Feature Selection Method Based on Feature Grouping and Genetic Algorithm

Composite Feature Selection using Deep Ensembles

MetaFS: An Effective Wrapper Feature Selection via Meta Learning

Feature and Instance Joint Selection: A Reinforcement Learning Perspective

A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

Feature Selection on Deep Learning Models: an Interactive Visualization Approach

High-dimensional Feature Selection in Classification: A Length-Adaptive Evolutionary Approach.

Penalized Generative Variable Selection

Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions

Deep Feature Selection Using a Novel Complementary Feature Mask

Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection

A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm

Feature Selection Via Joint Embedding Learning and Sparse Regression

Unsupervised Feature Extraction in Hyperspectral Images Based on Wasserstein Generative Adversarial Network.

An Interactive Feature Selection Method Based on Multi-Step State Transition Algorithm for High-Dimensional Data

Interactive Reinforcement Learning for Feature Selection with Decision Tree in the Loop

Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning