Smart Health: International Conference, ICSH 2019, Shenzhen, China, July 1–2, 2019, Proceedings

Ming Sheng,Yuyao Shao,Yong Zhang,Chaoqin Li,Chunxiao Xing,Han Zhang,Jingwen Wang,Fei Gao
DOI: https://doi.org/10.1007/978-3-030-34482-5
2019-01-01
Smart Health
Abstract:The study of complexity of metagenome populations is crucial in understanding different microbial communities. The potential number of microbes in the environment is much higher than our knowledge. However, most metagenomic projects only contain tens to hundreds of samples. Most of the microbes can hardly be sampled under such small sample size. Thus, there are many “dark matters” that never been observed. Here in this study, we proposed a statistical model, named SAM (Species Appearance Model), which uses only one to two hundred samples to optimize the parameters, and estimate the potential richness of dark matters when the data size is much higher. An index named ESS (Estimated saturated sample size) were also proposed as an indicator of the complexity of the metagenome population. In the dataset of the American Gut Project (AGP), SAM can precisely predict the OTU richness of pan metagenome with more than 1000 samples using only 200 samples. The ESS of AGP is *25,000, which means the AGP population is very complex. Using our SAM model, researchers can estimate and decide how many samples they need to collect when initiating a new metagenomic project. Different ESS values of different metagenomic populations can also serve as a guidance of understanding their different complexities.
What problem does this paper attempt to address?