Deep Feature Embedding for Tabular Data

Yuqian Wu,Hengyi Luo,Raymond S. T. Lee

2024-08-30

Abstract:Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper proposes a novel deep embedding framework with leverages lightweight deep neural networks to generate effective feature embeddings for tabular data in machine learning research. For numerical features, a two-step feature expansion and deep transformation technique is used to capture copious semantic information. For categorical features, a unique identification vector for each entity is referred by a compact lookup table with a parameterized deep embedding function to uniform the embedding size dimensions, and transformed into a embedding vector using deep neural network. Experiments are conducted on real-world datasets for performance evaluation.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The problem this paper attempts to address is the limitations of existing table data embedding techniques when handling numerical and categorical features, such as the inability to capture complex relationships and the need for extensive feature engineering. Specifically, linear scaling embeddings for numerical features may be ineffective, while embedding tables for categorical features can be very large, leading to excessive model parameters and low training efficiency. Therefore, the paper proposes a new deep embedding framework that utilizes lightweight deep neural networks to generate effective feature embeddings, aiming to improve the performance of table data in machine learning research. The main contributions of the paper include: 1. Proposing a two-step feature expansion and deep transformation technique for numerical feature embedding. 2. Proposing a parameter-efficient and effective deep decomposition embedding technique for categorical features. 3. Creating a unified embedding framework that simplifies input, enabling end-to-end training without extensive feature engineering. 4. Conducting extensive experiments to validate the effectiveness and efficiency of the proposed deep embedding methods. Through these improvements, the paper aims to address the shortcomings of existing embedding techniques and enhance the performance of table data in deep learning.

Deep Feature Embedding for Tabular Data

Embeddings for Tabular Data: A Survey

Categorical Embeddings for Tabular Data using PyTorch

A Survey on Deep Tabular Learning

Beyond Deep Learning: An Evolutionary Feature Engineering Approach to Tabular Data Classification

Transfer Learning with Deep Tabular Models

SwitchTab: Switched Autoencoders Are Effective Tabular Learners

Enriching Tabular Data with Contextual LLM Embeddings: A Comprehensive Ablation Study for Ensemble Classifiers

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

AutoSrh: an Embedding Dimensionality Search Framework for Tabular Data Prediction

ReConTab: Regularized Contrastive Representation Learning for Tabular Data

Dense Representation Learning and Retrieval for Tabular Data Prediction

EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

TabGSL: Graph Structure Learning for Tabular Data Prediction

T2G-Former: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction

Deep Learning with Tabular Data: A Self-supervised Approach

Tabular Transformers for Modeling Multivariate Time Series

iTabNet: an improved neural network for tabular data and its application to predict socioeconomic and environmental attributes

Deep Tabular Data Modeling with Dual-Route Structure-Adaptive Graph Networks

Time Sequence Deep Learning Model for Ubiquitous Tabular Data with Unique 3D Tensors Manipulation