Abstract:We develop a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale, and shape. Our approach enables the simultaneous modeling of all distribution parameters of an arbitrary parametric distribution of a multivariate response conditional on explanatory variables, while being applicable to potentially high-dimensional data. Moreover, the boosting algorithm incorporates data-driven variable selection, taking various different types of effects into account. As a special merit of our approach, it allows for modelling the association between multiple continuous or discrete outcomes through the relevant covariates. After a detailed simulation study investigating estimation and prediction performance, we demonstrate the full flexibility of our approach in three diverse biomedical applications. The first is based on high-dimensional genomic cohort data from the UK Biobank, considering a bivariate binary response (chronic ischemic heart disease and high cholesterol). Here, we are able to identify genetic variants that are informative for the association between cholesterol and heart disease. The second application considers the demand for health care in Australia with the number of consultations and the number of prescribed medications as a bivariate count response. The third application analyses two dimensions of childhood undernutrition in Nigeria as a bivariate response and we find that the correlation between the two undernutrition scores is considerably different depending on the child's age and the region the child lives in.

Splitting models for multivariate count data

Tree P{ó}lya Splitting distributions for multivariate count data

An R Package to Partition Observation Data Used for Model Development and Evaluation to Achieve Model Generalizability

Parametric Modelling of Multivariate Count Data Using Probabilistic Graphical Models

Pólya-splitting distributions as stationary solutions of multivariate birth–death processes under extended neutral theory

Multivariate Bernoulli distribution

A parsimonious family of multivariate Poisson-lognormal distributions for clustering multivariate count data

Sliced Wasserstein Regression

A new copula regression model for hierarchical data

A Generic Multivariate Distribution for Counting Data

Non-separable Models with High-dimensional Data

A Bayesian Zero-Inflated Dirichlet-Multinomial Regression Model for Multivariate Compositional Count Data

Model based clustering of multinomial count data

Review of Probability Distributions for Modeling Count Data

Classification of multivariate count data with multivariate log-linear conditional Poisson distribution

Modeling and inferences for bounded multivariate time series of counts

Boosting Multivariate Structured Additive Distributional Regression Models

Model-aware Quantile Regression for Discrete Data

Transition models for count data: a flexible alternative to fixed distribution models

Hierarchical approaches for flexible and interpretable binary regression models

Resampling-Based Multisplit Inference for High-Dimensional Regression