Gradient Boosting for Linear Mixed Models

Colin Griesbach,Benjamin Säfken,Elisabeth Waldmann
DOI: https://doi.org/10.48550/arXiv.2011.00947
2020-11-02
Abstract:Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.
Methodology
What problem does this paper attempt to address?
The problems that this paper attempts to solve are some specific problems that occur when using the gradient boosting method in linear and mixed - effect models. These problems lead to unbalanced effect selection, false shrinkage, and low convergence rates, as well as biases in random - effect estimates. Specifically, existing boosting methods have flaws when dealing with random effects. For example, random - effect estimates are related to observed covariates, resulting in biased estimates of fixed and random effects. These problems not only affect the predictive performance of the model but may also lead to inaccurate variable selection. To overcome these problems, the author proposes a new boosting algorithm that combines the successful concepts of gradient boosting and likelihood - based boosting. The new algorithm improves the selection process by excluding the random structure, correctly corrects the random - effect estimates, and provides likelihood - based estimates of the random - effect variance structure. This enables the new algorithm to provide an organic and unbiased fitting method, which is verified through simulations and data examples. In short, this paper aims to develop a more effective gradient - boosting algorithm to improve parameter estimation and variable selection in linear - mixed models, especially when dealing with high - dimensional data and random effects.