Abstract:<p>Correlation and association analyses are one of the most widely used statistical methods in research fields, including microbiome and integrative multiomics studies. Correlation and association have two implications: dependence and co-occurrence. Microbiome data are structured as phylogenetic tree and have several unique characteristics, including high dimensionality, compositionality, sparsity with excess zeros, and heterogeneity. These unique characteristics cause several statistical issues when analyzing microbiome data and integrating multiomics data, such as large p and small n, dependency, overdispersion, and zero-inflation. In microbiome research, on the one hand, classic correlation and association methods are still applied in real studies and used for the development of new methods; on the other hand, new methods have been developed to target statistical issues arising from unique characteristics of microbiome data. Here, we first provide a comprehensive view of classic and newly developed univariate correlation and association-based methods. We discuss the appropriateness and limitations of using classic methods and demonstrate how the newly developed methods mitigate the issues of microbiome data. Second, we emphasize that concepts of correlation and association analyses have been shifted by introducing network analysis, microbe-metabolite interactions, functional analysis, etc. Third, we introduce multivariate correlation and association-based methods, which are organized by the categories of exploratory, interpretive, and discriminatory analyses and classification methods. Fourth, we focus on the hypothesis testing of univariate and multivariate regression-based association methods, including alpha and beta diversities-based, count-based, and relative abundance (or compositional)-based association analyses. We demonstrate the characteristics and limitations of each approaches. Fifth, we introduce two specific microbiome-based methods: phylogenetic tree-based association analysis and testing for survival outcomes. Sixth, we provide an overall view of longitudinal methods in analysis of microbiome and omics data, which cover standard, static, regression-based time series methods, principal trend analysis, and newly developed univariate overdispersed and zero-inflated as well as multivariate distance/kernel-based longitudinal models. Finally, we comment on current association analysis and future direction of association analysis in microbiome and multiomics studies.</p>

Prediction analysis for microbiome sequencing data

Negative Binomial factor regression with application to microbiome data analysis

Using MicrobiomeAnalyst for comprehensive statistical, functional, and meta-analysis of microbiome data

Correlation and association analyses in microbiome study integrating multiomics in health and disease

Regression Analysis for Microbiome Compositional Data

Transformation and differential abundance analysis of microbiome data incorporating phylogeny

Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis

Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis

Bayesian Modeling of Microbiome Data for Differential Abundance Analysis

It's All Relative: New Regression Paradigm for Microbiome Compositional Data

An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data

A Flexible Zero-Inflated Poisson-Gamma model with application to microbiome read counts

MicroPredict: predicting species-level taxonomic abundance of whole-shotgun metagenomic data using only 16S amplicon sequencing data

A Fast Machine Learning Workflow for Rapid Phenotype Prediction from Whole Shotgun Metagenomes

Opportunities and limits of combining microbiome and genome data for complex trait prediction

A nonparametric spatial test to identify factors that shape a microbiome

Comparison of the effectiveness of different normalization methods for metagenomic cross-study phenotype prediction under heterogeneity

Predictive modeling of microbial data with interaction effects

A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes

Predicted meta-omics: a potential solution to multi-omics data scarcity in microbiome studies