Abstract:We study the problem of balancing effectiveness and efficiency in automated feature selection. Feature selection is to find an optimal feature subset from large feature space. After exploring many feature selection methods, we observe a computational dilemma: 1) traditional feature selection (e.g., mRMR) is mostly efficient, but difficult to identify the best subset; 2) the emerging reinforced feature selection automatically navigates feature space to search the best subset, but is usually inefficient. Are automation and efficiency always apart from each other? Can we bridge the gap between effectiveness and efficiency under automation? Motivated by this dilemma, we aim to develop a novel feature space navigation method. In our preliminary work, we leveraged interactive reinforcement learning to accelerate feature selection by external trainer-agent interaction. Our preliminary work can be significantly improved by modeling the structured knowledge of its downstream task (e.g., decision tree) as learning feedback. In this journal version, we propose a novel interactive and closed-loop architecture to simultaneously model interactive reinforcement learning (IRL) and decision tree feedback (DTF). Specifically, IRL is to create an interactive feature selection loop and DTF is to feed structured feature knowledge back to the loop. The DTF improves IRL from two aspects. First, the tree-structured feature hierarchy generated by decision tree is leveraged to improve state representation. In particular, we represent the selected feature subset as an undirected graph of feature-feature correlations and a directed tree of decision features. We propose a new embedding method capable of empowering Graph Convolutional Network (GCN) to jointly learn state representation from both the graph and the tree. Second, the tree-structured feature hierarchy is exploited to develop a new reward scheme. In particular, we personalize reward assignment of agents based on decision tree feature importance. In addition, observing agents’ actions can also be a feedback, we devise another new reward scheme, to weigh and assign reward based on the selected frequency ratio of each agent in historical action records. Finally, we present extensive experiments with real-world datasets to demonstrate the improved performances of our method.

Catch: Collaborative Feature Set Search for Automated Feature Engineering

Learning a Data-Driven Policy Network for Pre-Training Automated Feature Engineering

Visible-hidden Hybrid Automatic Feature Engineering Via Multi-Agent Reinforcement Learning

Toward Efficient Automated Feature Engineering

Automated Feature Selection: A Reinforcement Learning Perspective

SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks.

DAFEE: A Scalable Distributed Automatic Feature Engineering Algorithm for Relational Datasets

OpenFE: Automated Feature Generation with Expert-level Performance

Automating Feature Subspace Exploration via Multi-Agent Reinforcement Learning

AEFE: Automatic Embedded Feature Engineering for Categorical Features

Evolutionary Automated Feature Engineering

FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches

Feature Augmentation with Reinforcement Learning

Feature Engineering: The Method of Detecting Learner Behavior Patterns in Learning Analytics Field

FeatNavigator: Automatic Feature Augmentation on Tabular Data

MetaFS: An Effective Wrapper Feature Selection via Meta Learning

Large Language Models for Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

Interactive Reinforcement Learning for Feature Selection with Decision Tree in the Loop

Federated Automated Feature Engineering

Enhancing Tabular Data Optimization with a Flexible Graph-based Reinforced Exploration Strategy

An Interactive Feature Selection Method Based on Learning-from-crowds