Deep Feature Embedding for Tabular Data

Yuqian Wu,Hengyi Luo,Raymond S. T. Lee
2024-08-30
Abstract:Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper proposes a novel deep embedding framework with leverages lightweight deep neural networks to generate effective feature embeddings for tabular data in machine learning research. For numerical features, a two-step feature expansion and deep transformation technique is used to capture copious semantic information. For categorical features, a unique identification vector for each entity is referred by a compact lookup table with a parameterized deep embedding function to uniform the embedding size dimensions, and transformed into a embedding vector using deep neural network. Experiments are conducted on real-world datasets for performance evaluation.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the limitations of existing table data embedding techniques when handling numerical and categorical features, such as the inability to capture complex relationships and the need for extensive feature engineering. Specifically, linear scaling embeddings for numerical features may be ineffective, while embedding tables for categorical features can be very large, leading to excessive model parameters and low training efficiency. Therefore, the paper proposes a new deep embedding framework that utilizes lightweight deep neural networks to generate effective feature embeddings, aiming to improve the performance of table data in machine learning research. The main contributions of the paper include: 1. Proposing a two-step feature expansion and deep transformation technique for numerical feature embedding. 2. Proposing a parameter-efficient and effective deep decomposition embedding technique for categorical features. 3. Creating a unified embedding framework that simplifies input, enabling end-to-end training without extensive feature engineering. 4. Conducting extensive experiments to validate the effectiveness and efficiency of the proposed deep embedding methods. Through these improvements, the paper aims to address the shortcomings of existing embedding techniques and enhance the performance of table data in deep learning.