LinXGBoost: Extension of XGBoost to Generalized Local Linear Models

Laurent de Vito
DOI: https://doi.org/10.48550/arXiv.1710.03634
2017-10-10
Abstract:XGBoost is often presented as the algorithm that wins every ML competition. Surprisingly, this is true even though predictions are piecewise constant. This might be justified in high dimensional input spaces, but when the number of features is low, a piecewise linear model is likely to perform better. XGBoost was extended into LinXGBoost that stores at each leaf a linear model. This extension, equivalent to piecewise regularized least-squares, is particularly attractive for regression of functions that exhibits jumps or discontinuities. Those functions are notoriously hard to regress. Our extension is compared to the vanilla XGBoost and Random Forest in experiments on both synthetic and real-world data sets.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: in the low - dimensional input space, the traditional XGBoost model may not be able to provide the best prediction performance because its predicted values are piecewise constant. Especially when dealing with functions with jumps or discontinuities, this limitation is more obvious. To solve this problem, the author proposes LinXGBoost, which is an extension of XGBoost. It stores linear models in each leaf node instead of simple constant values. This extension enables the model to better fit functions with jumps or discontinuities and perform better in the low - dimensional input space. Specifically, the paper mainly focuses on the following aspects: 1. **Improving prediction performance**: By using linear models in each leaf node, LinXGBoost can provide better prediction performance in the low - dimensional input space, especially when dealing with functions with jumps or discontinuities. 2. **Reducing the number of trees**: Experiments show that LinXGBoost usually only needs a small number of trees (generally less than 5) to achieve the same accuracy as XGBoost using hundreds of trees. This not only improves the computational efficiency but also reduces the complexity of the model. 3. **Dealing with discontinuities**: LinXGBoost is especially suitable for regressing functions with jumps or discontinuities, which are usually difficult to fit with traditional piecewise constant models. 4. **Comparing with other methods**: The paper compares LinXGBoost with the original XGBoost and Random Forest through experiments on multiple synthetic datasets and real - world datasets, showing its advantages in certain scenarios. In summary, this paper aims to improve the prediction performance of the model in the low - dimensional input space by introducing linear models into the leaf nodes of XGBoost, especially for functions with jumps or discontinuities.