Prediction analysis for microbiome sequencing data

Tao Wang,Can Yang,Hongyu Zhao
DOI: https://doi.org/10.48550/arXiv.1710.02616
2017-10-07
Abstract:One primary goal of human microbiome studies is to predict host traits based on human microbiota. However, microbial community sequencing data present significant challenges to the development of statistical methods. In particular, the samples have different library sizes, the data contain many zeros and are often over-dispersed. To address these challenges, we introduce a new statistical framework, called predictive analysis in metagenomics via inverse regression (PAMIR). An inverse regression model is developed for over-dispersed microbiota counts given the trait, and then a prediction rule is constructed by taking advantage of the dimension-reduction structure in the model. An efficient Monte Carlo expectation-maximization algorithm is designed for carrying out maximum likelihood estimation. We demonstrate the advantages of PAMIR through simulations and a real data example.
Methodology
What problem does this paper attempt to address?