A Bayesian Nonparametric Approach for Identifying Differentially Abundant Taxa in Multigroup Microbiome Data with Covariates

Archie Sachdeva,Somnath Datta,Subharup Guha
DOI: https://doi.org/10.48550/arXiv.2206.10108
2023-12-28
Abstract:Scientific studies in the last two decades have established the central role of the microbiome in disease and health. Differential abundance analysis seeks to identify microbial taxa associated with sample groups defined by a factor such as disease subtype, geographical region, or environmental condition. The results, in turn, help clinical practitioners and researchers diagnose disease and develop treatments more effectively. However, microbiome data analysis is uniquely challenging due to high-dimensionality, sparsity, compositionally, and collinearity. There is a critical need for unified statistical approaches for differential analysis in the presence of covariates. We develop a zero-inflated Bayesian nonparametric (ZIBNP) methodology that meets these multipronged challenges. The proposed technique flexibly adapts to the unique data characteristics, casts the high proportion of zeros in a censoring framework, and mitigates high-dimensionality and collinearity by utilizing the dimension-reducing property of the semiparametric Chinese restaurant process. Additionally, the ZIBNP approach relates the microbiome sampling depths to inferential precision while accommodating the compositional nature of microbiome data. Through simulation studies and analyses of the CAnine Microbiome during Parasitism (CAMP) and Global Gut microbiome datasets, we demonstrate the accuracy of ZIBNP compared to established methods for differential abundance analysis in the presence of covariates.
Methodology
What problem does this paper attempt to address?