Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection

Meng Xiao,Dongjie Wang,Min Wu,Pengfei Wang,Yuanchun Zhou,Yanjie Fu

2023-09-15

Abstract:The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection criteria of these methods are varied for different domains, making them hard to generalize; 2) the selection performance of these approaches drops significantly when processing high-dimensional feature space coupled with small sample size. In light of these challenges, we pose the question: can selected feature subsets be more robust, accurate, and input dimensionality agnostic? In this paper, we reformulate the feature selection problem as a deep differentiable optimization task and propose a new research perspective: conceptualizing discrete feature subsetting as continuous embedding space optimization. We introduce a novel and principled framework that encompasses a sequential encoder, an accuracy evaluator, a sequential decoder, and a gradient ascent optimizer. This comprehensive framework includes four important steps: preparation of features-accuracy training data, deep feature subset embedding, gradient-optimized search, and feature subset reconstruction. Specifically, we utilize reinforcement feature selection learning to generate diverse and high-quality training data and enhance generalization. By optimizing reconstruction and accuracy losses, we embed feature selection knowledge into a continuous space using an encoder-evaluator-decoder model structure. We employ a gradient ascent search algorithm to find better embeddings in the learned embedding space. Furthermore, we reconstruct feature selection solutions using these embeddings and select the feature subset with the highest performance for downstream tasks as the optimal subset.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve robustness and accuracy in feature selection while reducing the dependence on input dimensions. Specifically, the paper points out that there are two main challenges in current feature selection methods: 1. **Insufficient generalization ability**: Feature selection criteria vary in different fields, making it difficult to find the best algorithm applicable to cross - domain datasets. This poses a generalization problem: how to maintain high accuracy across multiple fields? 2. **Performance degradation in high - dimensional feature spaces and with small sample sizes**: In some fields (such as biomedicine), although the number of features is large, the sample size is limited by factors such as cost, privacy, and ethnic groups. In this case, the point - to - point distances tend to be the same, and the data patterns become indistinguishable. Meanwhile, high dimensions will increase the complexity and time cost of feature selection. For data with small sample sizes, the distribution is sparse and the data patterns are not clear. This poses a robustness problem: how to automatically identify an effective and small - scale feature subset, and keep the time cost unchanged when dealing with different input dimensions? To address these challenges, the paper proposes a new perspective, that is, transforming the discrete feature subset selection problem into an optimization problem in a continuous embedding space. Through this method, a more effective gradient - optimization solution can be achieved, thereby improving the robustness and accuracy of feature selection. The paper also introduces a method of using reinforcement learning to automatically generate high - quality training data to enhance the generalization ability and robustness of the model.

Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection

Feature Selection as Deep Sequential Generative Learning

Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions

Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

Automated Feature Selection: A Reinforcement Learning Perspective

MetaFS: An Effective Wrapper Feature Selection via Meta Learning

Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection

DIFER: Differentiable Automated Feature Engineering

Scalable Optimization for Embedding Highly-Dynamic and Recency-Sensitive Data

Double-Structured Sparsity Guided Flexible Embedding Learning for Unsupervised Feature Selection

Deep Feature Selection Using a Novel Complementary Feature Mask

COMBSS: best subset selection via continuous optimization

Combinatorial Online High-Order Interactive Feature Selection Based on Dynamic Graph Convolution Network

Supervised Feature Selection via Collaborative Neurodynamic Optimization

Deep Embedding Learning with Discriminative Sampling Policy

Diverse Online Feature Selection

Group-wise Reinforcement Feature Generation for Optimal and Explainable Representation Space Reconstruction

A Modified Sequential Deep Floating Search Algorithm For Feature Selection

Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration

Effective Learning with Joint Discriminative and Representative Feature Selection

Goal-oriented Feature Extraction: a novel approach for enhancing data-driven surrogate model