Abstract:Microorganisms play critical roles in human health and disease. It is well known that microbes live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, multivariate statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. In addition, the analysis of microbial count data requires special attention because data commonly exhibit zero inflation. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Although there has been a great deal of effort in zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five species (of 44) associated with HIV infection.

Bayesian Variable Shrinkage and Selection in Compositional Data Regression: Application to Oral Microbiome

Bayesian compositional regression with microbiome features via variational inference

Bayesian compositional regression with flexible microbiome feature aggregation and selection

A Bayesian Zero-Inflated Dirichlet-Multinomial Regression Model for Multivariate Compositional Count Data

Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data

Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis

Bayesian graphical compositional regression for microbiome data

Bayesian compositional models for ordinal response

Bayesian Mixed Effects Models for Zero-inflated Compositions in Microbiome Data Analysis

It's All Relative: New Regression Paradigm for Microbiome Compositional Data

An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data

Bayesian Structural Learning with Parametric Marginals for Count Data: An Application to Microbiota Systems

Robust Regression with Compositional Covariates

A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes

Variable selection in microbiome compositional data analysis

Variable Selection in Regression with Compositional Covariates

Bayesian Nonparametric Ordination for the Analysis of Microbial Communities

Regression Analysis for Microbiome Compositional Data

Negative Binomial factor regression with application to microbiome data analysis

Two-Step Mixed-Type Multivariate Bayesian Sparse Variable Selection with Shrinkage Priors

Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior