Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

Yi Cheng,Renjun Hu,Haochao Ying,Xing Shi,Jian Wu,Wei Lin
2024-03-19
Abstract:Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores the issue of effective inductive bias in deep learning on tabular data. Specifically, the authors propose a hypothesis: arithmetic feature interaction is crucial for deep tabular learning. To validate this hypothesis, the authors created a synthetic tabular dataset and designed a modified Transformer architecture—AMFormer, which supports arithmetic feature interaction. ### Main Findings - **Performance of AMFormer**: Experimental results show that AMFormer significantly outperforms other strong baseline models in fine-grained tabular data modeling, training data efficiency, and generalization ability, with a particularly notable improvement in fine-grained modeling (up to 57%). - **Importance of Arithmetic Feature Interaction**: Experiments on the synthetic dataset demonstrate the importance of arithmetic feature interaction for deep tabular learning. - **Validation in Real-world Applications**: Extensive testing on four real-world datasets further validates the effectiveness and efficiency of AMFormer. ### Technical Details - **Parallel Attention Mechanism**: AMFormer extracts meaningful arithmetic feature interactions through parallel additive and multiplicative attention operations and fuses these candidate features via down-sampling linear layers. - **Prompt Optimization**: Prompt tokens are introduced to reduce the time and memory complexity of the self-attention mechanism and allow the model to capture cross-sample consistent feature interaction patterns, thereby preventing overfitting and improving robustness to data noise. In summary, the paper theoretically and experimentally demonstrates the necessity of arithmetic feature interaction in deep tabular learning and proposes a new architecture, AMFormer, showcasing its superior performance across various tasks.