MicrobiotaCN: an on-line, standard, convenient and comprehensive microbiome data analysis platform based on self-built gut prokaryotic genome collection

Bo Zheng,Junming Xu,Junjie Qin,Tingting Fan,Lulu Li,Yan Chen,Yuyang Jiang
DOI: https://doi.org/10.21203/rs.3.rs-1669983/v1
2022-01-01
Abstract:Abstract Background: The quality of the metagenomic data analysis depends on the quality of the reference database; however, the current database has some shortcomings. Different studies often use different reference databases for metagenomic analysis, resulting in inconsistent results, which can only be analyzed in isolation, and the results obtained from multiple projects are difficult to compare. Our work aimed to create a novel collection of human gut prokaryotic genomes, MBCN, as a reference database in a standardized metagenomic analysis platform named MicrobiotaCN that allows researchers to perform metagenomic analysis by the same standard pipeline efficiently.Results: About 2,477 human gut metagenomic samples were screened, and 16,785 MAGs (metagenomic assembled genomes) were assembled using a standardized pipeline. In addition, MAGs were combined with the representative genomes from the RefSeq and UHGG collections to cluster with 95% ANI clusters, and pan-genome for each cluster's genomes were constructed. MBCN collection contained 14,166 genomic species-level clusters. Kraken2 database was built with pan-genomes of MBCN and mOTUs database with the representative genomes of each cluster of MBCN. Comparing the Kraken database built by MBCN with other collections like UHGG on simulated reads and virtual bio-projects, the database built by MBCN was found to have a better assignment rate and more accurate profiling. In virtual bio-project and practical applications, MBCN had the potential to discover more biomarkers than other databases. We profiled 1,082 human gut metagenomic samples with MBCN Kraken2 database and organized the profiles and metadata on the platform, allowing users to get metagenomic profiles by the same standard pipeline. Simultaneously, common statistical and visualization tools for microbiome research were integrated into the on-line analysis platform.Conclusions: The reference database built based on MBCN was more comprehensive and accurate for profiling metagenomic reads, which integrates the use of the MicrobiotaCN online analysis platform will obtain a unified, one-stop metagenomic data analysis result. Thus, this could be a valuable resource for researchers to obtain profiles by a unified comprehensive reference database from different studies for meta-analysis. All data are available for free at http://www.microbiota.cn.
What problem does this paper attempt to address?