Gradient Boosting Trees and Large Language Models for Tabular Data Few-Shot Learning

Carlos Huertas
2024-11-07
Abstract:Large Language Models (LLM) have brought numerous of new applications to Machine Learning (ML). In the context of tabular data (TD), recent studies show that TabLLM is a very powerful mechanism for few-shot-learning (FSL) applications, even if gradient boosting decisions trees (GBDT) have historically dominated the TD field. In this work we demonstrate that although LLMs are a viable alternative, the evidence suggests that baselines used to gauge performance can be improved. We replicated public benchmarks and our methodology improves LightGBM by 290%, this is mainly driven by forcing node splitting with few samples, a critical step in FSL with GBDT. Our results show an advantage to TabLLM for 8 or fewer shots, but as the number of samples increases GBDT provides competitive performance at a fraction of runtime. For other real-life applications with vast number of samples, we found FSL still useful to improve model diversity, and when combined with ExtraTrees it provides strong resilience to overfitting, our proposal was validated in a ML competition setting ranking first place.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper mainly explores the performance comparison between Gradient Boosting Decision Trees (GBDT) and Large Language Models (LLM) on Tabular Data (TD) in the Few - Shot Learning (FSL) scenario, and attempts to improve the baseline performance of GBDT. Specifically, the paper aims to solve the following problems: 1. **Poor performance of GBDT in FSL**: - Previous studies have shown that LLM performs well in few - shot learning, while GBDT has poor performance in the case of very few samples. By adjusting the key parameters of GBDT (such as `min_data_in_leaf`), the paper significantly improves its performance in FSL tasks, enabling GBDT to better adapt to the few - shot learning scenario. 2. **Establishing a fair baseline comparison**: - Since LLM may have a memory effect on certain datasets, resulting in excellent performance on certain tasks, but not a true few - shot learning ability. Therefore, the paper emphasizes the need to establish a fair baseline to ensure the fairness of performance evaluation. By optimizing the parameter configuration of GBDT, the author proves that GBDT can compete with LLM after appropriate adjustment, and even outperform LLM in some cases. 3. **Exploring the performance of different models under different sample sizes**: - The paper analyzes the performance changes of GBDT and LLM as the number of samples increases. The results show that with very few samples (such as 4 - 8 samples), LLM has an advantage; but when the number of samples increases, GBDT not only provides competitive performance, but also has a shorter running time. 4. **The value of FSL in practical applications**: - The paper also shows the value of FSL in practical applications, especially when dealing with large - scale data. Through FSL, model diversity can be increased and robustness to over - fitting can be enhanced. For example, in the FedCSIS 2024 Data Science Challenge, the author used the FSL strategy to build multiple orthogonal models and finally won the first place. ### Summary By optimizing the parameter configuration of GBDT, this paper significantly improves its performance in few - shot learning and makes a fair comparison with LLM. The research results show that GBDT can perform well in few - shot learning tasks after appropriate adjustment, especially when the number of samples increases, GBDT not only has superior performance, but also has higher computational efficiency. In addition, the paper also emphasizes the importance and potential of FSL in practical applications.