GMrepo v2: a curated human gut microbiome database with special focus on disease markers and cross-dataset comparison

Die Dai,Jiaying Zhu,Chuqing Sun,Min Li,Jinxin Liu,Sicheng Wu,Kang Ning,Li-jie He,Xing-Ming Zhao,Wei-Hua Chen
DOI: https://doi.org/10.1093/nar/gkab1019
IF: 14.9
2021-11-12
Nucleic Acids Research
Abstract:Abstract GMrepo (data repository for Gut Microbiota) is a database of curated and consistently annotated human gut metagenomes. Its main purposes are to increase the reusability and accessibility of human gut metagenomic data, and enable cross-project and phenotype comparisons. To achieve these goals, we performed manual curation on the meta-data and organized the datasets in a phenotype-centric manner. GMrepo v2 contains 353 projects and 71,642 runs/samples, which are significantly increased from the previous version. Among these runs/samples, 45,111 and 26,531 were obtained by 16S rRNA amplicon and whole-genome metagenomics sequencing, respectively. We also increased the number of phenotypes from 92 to 133. In addition, we introduced disease-marker identification and cross-project/phenotype comparison. We first identified disease markers between two phenotypes (e.g. health versus diseases) on a per-project basis for selected projects. We then compared the identified markers for each phenotype pair across datasets to facilitate the identification of consistent microbial markers across datasets. Finally, we provided a marker-centric view to allow users to check if a marker has different trends in different diseases. So far, GMrepo includes 592 marker taxa (350 species and 242 genera) for 47 phenotype pairs, identified from 83 selected projects. GMrepo v2 is freely available at: https://gmrepo.humangut.info.
biochemistry & molecular biology
What problem does this paper attempt to address?