Abstract:XGBoost is often presented as the algorithm that wins every ML competition. Surprisingly, this is true even though predictions are piecewise constant. This might be justified in high dimensional input spaces, but when the number of features is low, a piecewise linear model is likely to perform better. XGBoost was extended into LinXGBoost that stores at each leaf a linear model. This extension, equivalent to piecewise regularized least-squares, is particularly attractive for regression of functions that exhibits jumps or discontinuities. Those functions are notoriously hard to regress. Our extension is compared to the vanilla XGBoost and Random Forest in experiments on both synthetic and real-world data sets.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: in the low - dimensional input space, the traditional XGBoost model may not be able to provide the best prediction performance because its predicted values are piecewise constant. Especially when dealing with functions with jumps or discontinuities, this limitation is more obvious. To solve this problem, the author proposes LinXGBoost, which is an extension of XGBoost. It stores linear models in each leaf node instead of simple constant values. This extension enables the model to better fit functions with jumps or discontinuities and perform better in the low - dimensional input space. Specifically, the paper mainly focuses on the following aspects: 1. **Improving prediction performance**: By using linear models in each leaf node, LinXGBoost can provide better prediction performance in the low - dimensional input space, especially when dealing with functions with jumps or discontinuities. 2. **Reducing the number of trees**: Experiments show that LinXGBoost usually only needs a small number of trees (generally less than 5) to achieve the same accuracy as XGBoost using hundreds of trees. This not only improves the computational efficiency but also reduces the complexity of the model. 3. **Dealing with discontinuities**: LinXGBoost is especially suitable for regressing functions with jumps or discontinuities, which are usually difficult to fit with traditional piecewise constant models. 4. **Comparing with other methods**: The paper compares LinXGBoost with the original XGBoost and Random Forest through experiments on multiple synthetic datasets and real - world datasets, showing its advantages in certain scenarios. In summary, this paper aims to improve the prediction performance of the model in the low - dimensional input space by introducing linear models into the leaf nodes of XGBoost, especially for functions with jumps or discontinuities.

LinXGBoost: Extension of XGBoost to Generalized Local Linear Models

Local Linear M‐estimation in Non‐parametric Spatial Regression

Generalized XGBoost Method

A Local Online Learning Approach for Non-linear Data.

Why (and When) does Local SGD Generalize Better than SGD?

Multi-Target XGBoostLSS Regression

GLM+: An Efficient System for Generalized Linear Models

Continuously Generalized Ordinal Regression for Linear and Deep Models

A comparative analysis of gradient boosting algorithms

Stability and Generalization for Minibatch SGD and Local SGD

XGBoostLSS -- An extension of XGBoost to probabilistic forecasting

The Behavior and Convergence of Local Bayesian Optimization

Survival regression with accelerated failure time model in XGBoost

Logistic Regression, AdaBoost and Bregman Distances

Generalised Boosted Forests

Local Linear Forests

FairXGBoost: Fairness-aware Classification in XGBoost

Optimization by gradient boosting

Scaling Up Diffusion and Flow-based XGBoost Models

G-LIME: Statistical Learning for Local Interpretations of Deep Neural Networks Using Global Priors.

Learning Nonlinear Functions Using Regularized Greedy Forest