Boosting Distributional Copula Regression for Bivariate Binary, Discrete and Mixed Responses
Guillermo Briseño Sanchez,Nadja Klein,Hannah Klinkhammer,Andreas Mayr
2024-03-05
Abstract:Motivated by challenges in the analysis of biomedical data and observational
studies, we develop statistical boosting for the general class of bivariate
distributional copula regression with arbitrary marginal distributions, which
is suited to model binary, count, continuous or mixed outcomes. In our
framework, the joint distribution of arbitrary, bivariate responses is modelled
through a parametric copula. To arrive at a model for the entire conditional
distribution, not only the marginal distribution parameters but also the copula
parameters are related to covariates through additive predictors. We suggest
efficient and scalable estimation by means of an adapted component-wise
gradient boosting algorithm with statistical models as base-learners. A key
benefit of boosting as opposed to classical likelihood or Bayesian estimation
is the implicit data-driven variable selection mechanism as well as shrinkage
without additional input or assumptions from the analyst. To the best of our
knowledge, our implementation is the only one that combines a wide range of
covariate effects, marginal distributions, copula functions, and implicit
data-driven variable selection. We showcase the versatility of our approach on
data from genetic epidemiology, healthcare utilization and childhood
undernutrition. Our developments are implemented in the R package gamboostLSS,
fostering transparent and reproducible research.
Methodology