Sufficient dimension reduction for a novel class of zero-inflated graphical models

Eric Koplin,Liliana Forzani,Diego Tomassi,Ruth M. Pfeiffer
DOI: https://doi.org/10.1016/j.csda.2024.107959
IF: 2.035
2024-04-10
Computational Statistics & Data Analysis
Abstract:Graphical models allow modeling of complex dependencies among components of a random vector. In many applications of graphical models, however, for example microbiome data, the data have an excess number of zero values. New pairwise graphical models with distributions in an exponential family are presented, that accommodate excess numbers of zeros in the random vector components. First these multivariate distributions are characterized in terms of univariate conditional distributions. Then model predictors that arise from such a pairwise graphical model with excess zeros are modeled as functions of an outcome, and the corresponding first order sufficient dimension reduction (SDR) is derived. That is, linear combinations of the predictors that contain all the information for the regression of the outcome as a function of the predictors are obtained. To incorporate variable selection, the SDR is estimated using a pseudo-likelihood with a hierarchical penalty that prioritizes sparse interactions only for variables associated with the outcome. This method yields consistent estimators of the reduction and can be applied to continuous or categorical outcomes. The new methods are then illustrated by studying normal, Poisson and truncated Poisson graphical models with excess zeros in simulations and by analyzing microbiome data from the American Gut Project. The models provided robust variable selection and the predictive performance of the Poisson zero-inflation pairwise graphical model was equal or better than that obtained from applying other available methods for the analysis of microbiome data.
statistics & probability,computer science, interdisciplinary applications
What problem does this paper attempt to address?