Abstract:Abstract Motivation Microbiome data have proven extremely useful for understanding microbial communities and their impacts in health and disease. Although microbiome analysis methods and standards are evolving rapidly, obtaining meaningful and interpretable results from microbiome studies still requires careful statistical treatment. In particular, many existing and emerging methods for differential abundance (DA) analysis fail to account for the fact that microbiome data are high-dimensional and sparse, compositional, negatively and positively correlated and phylogenetically structured. To better describe microbiome data and improve the power of DA testing, there is still a great need for the continued development of appropriate statistical methodology. Results In this article, we propose a model-based approach for microbiome data transformation, and a phylogenetically informed procedure for DA testing based on the transformed data. First, we extend the Dirichlet-tree multinomial (DTM) to zero-inflated DTM for multivariate modeling of microbial counts, addressing data sparsity and correlation and phylogeny among bacterial taxa. Then, within this framework and using a Bayesian formulation, we introduce posterior mean transformation to convert raw counts into non-zero relative abundances that sum to one, accounting for the compositionality nature of microbiome data. Second, using the transformed data, we propose adaptive analysis of composition of microbiomes (adaANCOM) for DA testing by constructing log-ratios adaptively on the tree for each taxon, greatly reducing the computational complexity of ANCOM in high dimensions. Finally, we present extensive simulation studies, an analysis of HMP data across 18 body sites and 2 visits, and an application to a gut microbiome and malnutrition study, to investigate the performance of posterior mean transformation and adaANCOM. Comparisons with ANCOM and other DA testing procedures show that adaANCOM controls the false discovery rate well, allows for easy interpretation of the results, and is computationally efficient for high-dimensional problems. Availability and implementation The developed R package is available at https://github.com/ZRChao/adaANCOM. For replicability purposes, scripts for our simulations and data analysis are available at https://github.com/ZRChao/Papers_supplementary. Supplementary information Supplementary data are available at Bioinformatics online.

Dirichlet-tree multinomial mixtures for clustering microbiome compositions

Cluster analysis of microbiome data by using mixtures of Dirichlet–multinomial regression models

Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization

Sparse tree-based clustering of microbiome data to characterize microbiome heterogeneity in pancreatic cancer

A phylogenetic scan test on Dirichlet-tree multinomial model for microbiome data

Clustering on Human Microbiome Sequencing Data: A Distance-Based Unsupervised Learning Model

DCMD: Distance-based Classification Using Mixture Distributions on Microbiome Data

An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data

Mixed Effect Dirichlet-Tree Multinomial for Longitudinal Microbiome Data and Weight Prediction

Bayesian Modeling of Microbiome Data for Differential Abundance Analysis

Bayesian graphical compositional regression for microbiome data

Microbiome subcommunity learning with logistic-tree normal latent Dirichlet allocation

Poisson hurdle model-based method for clustering microbiome features

Bayesian Mixed Effects Models for Zero-inflated Compositions in Microbiome Data Analysis

Dirichlet distribution parameter estimation with applications in microbiome analyses

Transformation and differential abundance analysis of microbiome data incorporating phylogeny

Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis

Logistic Normal Multinomial Factor Analyzers for Clustering Microbiome Data

DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

PhyloMix: Enhancing microbiome-trait association prediction through phylogeny-mixing augmentation