Abstract:Feature selection prepares the AI-readiness of data by eliminating redundant features. Prior research falls into two primary categories: i) Supervised Feature Selection, which identifies the optimal feature subset based on their relevance to the target variable; ii) Unsupervised Feature Selection, which reduces the feature space dimensionality by capturing the essential information within the feature set instead of using target variable. However, SFS approaches suffer from time-consuming processes and limited generalizability due to the dependence on the target variable and downstream ML tasks. UFS methods are constrained by the deducted feature space is latent and untraceable. To address these challenges, we introduce an innovative framework for feature selection, which is guided by knockoff features and optimized through reinforcement learning, to identify the optimal and effective feature subset. In detail, our method involves generating "knockoff" features that replicate the distribution and characteristics of the original features but are independent of the target variable. Each feature is then assigned a pseudo label based on its correlation with all the knockoff features, serving as a novel metric for feature evaluation. Our approach utilizes these pseudo labels to guide the feature selection process in 3 novel ways, optimized by a single reinforced agent: 1). A deep Q-network, pre-trained with the original features and their corresponding pseudo labels, is employed to improve the efficacy of the exploration process in feature selection. 2). We introduce unsupervised rewards to evaluate the feature subset quality based on the pseudo labels and the feature space reconstruction loss to reduce dependencies on the target variable. 3). A new {\epsilon}-greedy strategy is used, incorporating insights from the pseudo labels to make the feature selection process more effective.

A-SFS: Semi-supervised Feature Selection based on Multi-task Self-supervision

Semi-supervised feature selection based on structure and constraints preserving

U^2F^2S^2 : Uncovering Feature-level Similarities for Unsupervised Feature Selection

AFS: An Attention-based mechanism for Supervised Feature Selection

GAEFS: Self-supervised Graph Auto-encoder enhanced Feature Selection

Joint Semi-Supervised Feature Selection and Classification Through Bayesian Approach

Self-adjusted graph based semi-supervised embedded feature selection

Self-paced Semi-Supervised Feature Selection with Application to Multi-Modal Alzheimer’s Disease Classification

Unsupervised soft-label feature selection

Feature-Aligned Stacked Autoencoder: A Novel Semisupervised Deep Learning Model for Pattern Classification of Industrial Faults

Semi-supervised feature selection based on discernibility matrix and mutual information

Semi-Supervised Multiview Feature Selection With Adaptive Graph Learning

Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

Binary Label Learning for Semi-supervised Feature Selection

A Convex Formulation for Semi-Supervised Multi-Label Feature Selection.

Generalized Semi-Supervised Learning via Self-Supervised Feature Adaptation

Leveraging Local Density Decision Labeling and Fuzzy Dependency for Semi-supervised Feature Selection

A Robust Semi-Supervised Broad Learning System Guided by Ensemble-Based Self-Training

Efficient multi-view semi-supervised feature selection

An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition

Autoencoder Inspired Unsupervised Feature Selection