Cluster analysis of microbiome data by using mixtures of Dirichlet–multinomial regression models

Sanjeena Subedi,Drew Neish,Stephen Bak,Zeny Feng
DOI: https://doi.org/10.1111/rssc.12432
2020-07-26
Abstract:<p>The human gut microbiome is one of the fundamental components of our physiology, and exploring the relationship between biological and environmental covariates and the resulting taxonomic composition of a given microbial community is an active area of research. Previously, a Dirichlet–multinomial regression framework has been suggested to model this relationship, but it did not account for any underlying latent group structure. An underlying group structure of guts (such as enterotypes) has been observed across gut microbiome samples in which guts in the same group share similar biota compositions. In the paper, a finite mixture of Dirichlet–multinomial regression models is proposed that accounts for this underlying group structure and to allow for a probabilistic investigation of the relationship between bacterial abundance and biological and/or environmental covariates within each inferred group. Furthermore, finite mixtures of regression models which incorporate the concomitant effect of the covariates on the resulting mixing proportions are also proposed and examined within the Dirichlet–multinomial framework. We utilize the proposed mixture model to gain insight on underlying subgroups in a microbiome data set comprising tumour and healthy samples and the relationships between covariates and microbial abundance in those subgroups.</p>
statistics & probability
What problem does this paper attempt to address?