Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis

Hongzhe Li
DOI: https://doi.org/10.1146/annurev-statistics-010814-020351
IF: 7.9
2015-04-10
Annual Review of Statistics and Its Application
Abstract:The human microbiome is the totality of all microbes in and on the human body, and its importance in health and disease has been increasingly recognized. High-throughput sequencing technologies have recently enabled scientists to obtain an unbiased quantification of all microbes constituting the microbiome. Often, a single sample can produce hundreds of millions of short sequencing reads. However, unique characteristics of the data produced by the new technologies, as well as the sheer magnitude of these data, make drawing valid biological inferences from microbiome studies difficult. Analysis of these big data poses great statistical and computational challenges. Important issues include normalization and quantification of relative taxa, bacterial genes, and metabolic abundances; incorporation of phylogenetic information into analysis of metagenomics data; and multivariate analysis of high-dimensional compositional data. We review existing methods, point out their limitations, and outline future research directions.
statistics & probability,mathematics, interdisciplinary applications
What problem does this paper attempt to address?