Gradient Boosting for Linear Mixed Models

Colin Griesbach,Benjamin Säfken,Elisabeth Waldmann

DOI: https://doi.org/10.48550/arXiv.2011.00947

2020-11-02

Abstract:Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.

Methodology

What problem does this paper attempt to address?

The problems that this paper attempts to solve are some specific problems that occur when using the gradient boosting method in linear and mixed - effect models. These problems lead to unbalanced effect selection, false shrinkage, and low convergence rates, as well as biases in random - effect estimates. Specifically, existing boosting methods have flaws when dealing with random effects. For example, random - effect estimates are related to observed covariates, resulting in biased estimates of fixed and random effects. These problems not only affect the predictive performance of the model but may also lead to inaccurate variable selection. To overcome these problems, the author proposes a new boosting algorithm that combines the successful concepts of gradient boosting and likelihood - based boosting. The new algorithm improves the selection process by excluding the random structure, correctly corrects the random - effect estimates, and provides likelihood - based estimates of the random - effect variance structure. This enables the new algorithm to provide an organic and unbiased fitting method, which is verified through simulations and data examples. In short, this paper aims to develop a more effective gradient - boosting algorithm to improve parameter estimation and variable selection in linear - mixed models, especially when dealing with high - dimensional data and random effects.

Gradient Boosting for Linear Mixed Models

Predictive analytics with gradient boosting in clinical medicine

A boosting method to select the random effects in linear mixed models

Linear mixed effects models for non‐Gaussian continuous repeated measurement data

Extension of the Gradient Boosting Algorithm for Joint Modeling of Longitudinal and Time-to-Event data

Extending Statistical Boosting - An Overview of Recent Methodological Developments

Significance Tests for Boosted Location and Scale Models with Linear Base-Learners

Optimization by gradient boosting

Boosting joint models for longitudinal and time-to-event data

The Evolution of Boosting Algorithms - From Machine Learning to Statistical Modelling

The column measure and Gradient-Free Gradient Boosting

Gradient and Newton boosting for classification and regression

Gradient Boosting: A Computationally Efficient Alternative to Markov Chain Monte Carlo Sampling for Fitting Large Bayesian Spatio-Temporal Binomial Regression Models

metboost: Exploratory regression analysis with hierarchically clustered data

Gradient matching accelerates mixed-effects inference for biochemical networks

A Fast Sampling Gradient Tree Boosting Framework

Boosting variable selection algorithm for linear regression models

Finding structure in data using multivariate tree boosting

Adaptive Fitting of Linear Mixed-Effects Models with Correlated Random-effects

Gradient Boosting for Hierarchical Data in Small Area Estimation

An update on statistical boosting in biomedicine