mbDecoda: a debiased approach to compositional data analysis for microbiome surveys

Yuxuan Zong,Hongyu Zhao,Tao Wang
DOI: https://doi.org/10.1093/bib/bbae205
IF: 9.5
2024-05-05
Briefings in Bioinformatics
Abstract:Potentially pathogenic or probiotic microbes can be identified by comparing their abundance levels between healthy and diseased populations, or more broadly, by linking microbiome composition with clinical phenotypes or environmental factors. However, in microbiome studies, feature tables provide relative rather than absolute abundance of each feature in each sample, as the microbial loads of the samples and the ratios of sequencing depth to microbial load are both unknown and subject to considerable variation. Moreover, microbiome abundance data are count-valued, often over-dispersed and contain a substantial proportion of zeros. To carry out differential abundance analysis while addressing these challenges, we introduce mbDecoda, a model-based approach for debiased analysis of sparse compositions of microbiomes. mbDecoda employs a zero-inflated negative binomial model, linking mean abundance to the variable of interest through a log link function, and it accommodates the adjustment for confounding factors. To efficiently obtain maximum likelihood estimates of model parameters, an Expectation Maximization algorithm is developed. A minimum coverage interval approach is then proposed to rectify compositional bias, enabling accurate and reliable absolute abundance analysis. Through extensive simulation studies and analysis of real-world microbiome datasets, we demonstrate that mbDecoda compares favorably with state-of-the-art methods in terms of effectiveness, robustness and reproducibility.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
This paper attempts to solve several key problems in microbiome research: 1. **Compositional bias**: In microbiome research, the data provided by the feature table are relative abundances rather than absolute abundances, because both the microbial load in the sample and the ratio of sequencing depth to microbial load are unknown and highly variable. This compositional bias leads to high false - positive and false - negative rates when using relative abundances to infer absolute abundances. 2. **Sparsity and over - dispersion**: Microbiome abundance data are count values and usually have over - dispersion and a large number of zero values. These problems make traditional statistical methods less effective in dealing with these data. To address the above challenges, the paper proposes a model - based method - **mbDecoda** (Model - based Debiased Compositional Data Analysis) for unbiased compositional data analysis in microbiome investigations. Specifically: - **Model selection**: mbDecoda uses the zero - inflated negative binomial (ZINB) model to describe microbiome abundance data, which can handle sparsity and over - dispersion. - **Link function**: The mean abundance is associated with the variables of interest through a log - link function, and it supports the adjustment of covariates or confounding factors. - **Parameter estimation**: An efficient expectation - maximization (EM) algorithm has been developed to approximately obtain the maximum - likelihood estimates of the model parameters. - **Bias correction**: The minimum coverage interval (MCI) method is proposed to correct the compositional bias, thereby achieving accurate and reliable absolute abundance analysis. Through extensive simulation studies and the analysis of real - world microbiome datasets, the paper shows that mbDecoda is superior to existing methods in terms of effectiveness, robustness, and reproducibility.