Tabular Data: Deep Learning is Not All You Need

Ravid Shwartz-Ziv,Amitai Armon
DOI: https://doi.org/10.48550/arXiv.2106.03253
2021-11-23
Abstract:A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression problems with tabular data. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use cases. This paper explores whether these deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. In addition to systematically comparing their performance, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Should the recently proposed deep - learning models for tabular data be the recommended choice? Specifically, this research aims to explore the following two aspects: 1. **Model accuracy**: Are these new deep - learning models more accurate than the existing classical models (such as XGBoost), especially on those datasets that do not appear in the original papers? 2. **Time cost of training and hyper - parameter search**: How long does it take to train and optimize the hyper - parameters of these deep - learning models compared with XGBoost? ### Background Traditionally, gradient - boosted decision tree (GBDT) models such as XGBoost have been widely recommended for their superior performance on tabular data. However, in recent years, some studies have proposed deep - learning models for tabular data and claimed that these models can outperform XGBoost in some cases. However, due to the lack of standard benchmark datasets, it is difficult to compare these models, and the degree of model optimization in different studies varies, resulting in unclear conclusions. ### Research purpose The main purpose of this study is to evaluate whether these deep - learning models should be the recommended choice for tabular data problems by systematically comparing the performance of these newly proposed deep - learning models with XGBoost on multiple datasets, as well as the parameter - tuning and computing resources they require. ### Methods 1. **Dataset selection**: The study used 11 different tabular datasets, 9 of which were from previous studies and 2 from Kaggle competitions. 2. **Experimental setup**: All models were trained and evaluated using the same parameter - tuning protocol. The researchers used the Bayesian optimization method (HyperOpt) to optimize the hyper - parameters of each model. 3. **Performance evaluation**: For binary classification problems, cross - entropy loss was used; for regression problems, the root - mean - square error (RMSE) was used. Each configuration was experimented four times, and the average performance and standard error on the test set were reported. ### Main findings 1. **Model generalization ability**: - Deep - learning models generally perform worse than XGBoost on unseen datasets. XGBoost outperforms deep - learning models on 8 out of 11 datasets, and the difference is significant (p < 0.005). - Each deep - learning model performs best on the dataset used in its original paper, but its performance drops significantly on other datasets. 2. **Model integration**: - The integration of deep - learning models and XGBoost performs best on most datasets. On 7 out of 11 datasets, this integrated model significantly outperforms a single deep - learning model (p < 0.005). - The integration of deep - learning models alone or classical models alone has a poorer effect. 3. **Optimization difficulty**: - The hyper - parameter search process of XGBoost is much shorter than that of deep - learning models. - Deep - learning models require more computing resources in training and parameter - tuning. ### Conclusion Although deep - learning models perform well on some specific datasets, overall, XGBoost is still the recommended choice for tabular data problems. In addition, the integration method of combining XGBoost with deep - learning models can further improve performance. However, deep - learning models require more computing resources in training and parameter - tuning, which may be a limiting factor in practical applications. Therefore, the study believes that current deep - learning is not the only choice for solving tabular data problems.