HMOD: An Omics Database for Herbal Medicine Plants
Xiao Wang,Jiajin Zhang,Simei He,Yuanni Gao,Xiaoqin Ma,Yun Gao,Guanghui Zhang,Ling Kui,Wen Wang,Ying Wang,Shengchao Yang,Yang Dong
DOI: https://doi.org/10.1016/j.molp.2018.03.002
IF: 27.5
2018-01-01
Molecular Plant
Abstract:More than 50% of drugs are derived from chemical compounds that have been isolated from various plants (Fabricant and Farnsworth, 2001Fabricant D.S. Farnsworth N.R. The value of plants used in traditional medicine for drug discovery.Environ. Health Perspect. 2001; 109: 69-75Crossref PubMed Scopus (1344) Google Scholar, Yarnell and Abascal, 2002Yarnell E. Abascal K. Dilemmas of traditional botanical research.HerbalGram. 2002; 55: 46Google Scholar). With the development of sequencing technology and synthetic biology, we can obtain molecular information from the transcriptomic and genomic data of plants and then utilize bacteria to synthesize desired chemical compounds (Atanasov et al., 2015Atanasov A.G. Waltenberger B. Pferschy-Wenzig E.M. Linder T. Wawrosch C. Uhrin P. Temml V. Wang L. Schwaiger S. Heiss E.H. et al.Discovery and resupply of pharmacologically active plant-derived natural products: a review.Biotechnol. Adv. 2015; 33: 1582-1614Crossref PubMed Scopus (1437) Google Scholar, Smanski et al., 2016Smanski M.J. Zhou H. Claesen J. Shen B. Fischbach M.A. Voigt C.A. Synthetic biology to access and expand nature's chemical diversity.Nat. Rev. Microbiol. 2016; 14: 135-149Crossref PubMed Scopus (313) Google Scholar). Increasing numbers of researchers have started to publish omics data generated from herbal plants. However, there has been a concern that redundant data generation might occur, with some researchers expressing a desire for an all-inclusive reliable omics database for herbal medicine plants. Establishing such a database is of great importance in flourishing the research of the biogenesis and functions of herbal medicines (Yan et al., 2015Yan L. Wang X. Liu H. Tian Y. Lian J. Yang R. Hao S. Wang X. Yang S. Li Q. et al.The genome of Dendrobium officinale illuminates the biology of the important traditional Chinese orchid herb.Mol. Plant. 2015; 8: 922-934Abstract Full Text Full Text PDF PubMed Scopus (180) Google Scholar, Zhang et al., 2015Zhang G. Tian Y. Zhang J. Shu L. Yang S. Wang W. Sheng J. Dong Y. Chen W. Hybrid de novo genome assembly of the Chinese herbal plant danshen (Salvia miltiorrhiza Bunge).Gigascience. 2015; 4: 62Crossref PubMed Scopus (62) Google Scholar, Li et al., 2016Li J. Chen C. Wang Z.Z. The complete chloroplast genome of the Dendrobium strongylanthum (Orchidaceae: Epidendroideae).Mitochondrial DNA A DNA Mapp. Seq. Anal. 2016; 27: 3048-3049PubMed Google Scholar, Xu et al., 2016Xu H. Song J. Luo H. Zhang Y. Li Q. Zhu Y. Xu J. Li Y. Song C. Wang B. et al.Analysis of the genome sequence of the medicinal plant salvia miltiorrhiza.Mol. Plant. 2016; 9: 949-952Abstract Full Text Full Text PDF PubMed Scopus (190) Google Scholar). We have built the Herbal Medicine Omics Database (HMOD, Figure 1A, http://herbalplant.ynau.edu.cn/) to provide a reliable omics resource of herbal medicine plants for all researchers. In this database, we have cataloged the public-available genomic, transcriptomic, pathways data and metabolomics information of herbal medicine plants, as well as unpublished transcriptomic and enzyme data identified from KEGG annotation. Moreover, a generic genome browser (Gbrowse) has been integrated to allow the viewing of genome sequences. A BLAST tool is also provided in our database. To provide the latest advances and more analysis tools for herbal medicine plants, HMOD will be updated when new data are available (ftp://202.203.187.112:2222/). HMOD collects 23 published genomes of medicinal herbs including Panax notoginseng and other important species (Figure 1B and Supplemental Table 1) (Chen et al., 2017Chen W. Kui L. Zhang G. Zhu S. Zhang J. Wang X. Yang M. Huang H. Liu Y. Wang Y. et al.Whole-genome sequencing and analysis of the Chinese herbal plant panax notoginseng.Mol. Plant. 2017; 10: 899-902Abstract Full Text Full Text PDF PubMed Scopus (56) Google Scholar, Zhang et al., 2017Zhang D. Li W. Xia E.H. Zhang Q.J. Liu Y. Zhang Y. Tong Y. Zhao Y. Niu Y.C. Xu J.H. et al.The medicinal herb panax notoginseng genome provides insights into ginsenoside biosynthesis and genome evolution.Mol. Plant. 2017; 10: 903-907Abstract Full Text Full Text PDF PubMed Scopus (70) Google Scholar). The data for every species consist of an introduction, resequencing information, downloadable information, the Gbrowse internet browser, and BLAST. In the introduction of each herbal plant, we describe its basic areal distribution and pharmacological function. As there are still no published genome resequencing data for any medicinal herbs, single nucleotide polymorphism (SNP) information and analysis can only be added when available. For the downloadable data, we have summarized the published year, institution, sample information, sequencing platform, data size, assembly results, and annotation methods used in the projects (Figure 1C). Genomic data are contained in a fasta formatted genome file, with a cds file available in fasta format, and a protein data file available in both fasta and gff3 formats. All these files can be downloaded using ftp. The Gbrowse browser and BLAST tool are linked for further genetic and enzyme-based analysis. HMOD contains 172 transcriptomes in 57 plant families (124 published data and 48 de novo data sequenced, assembled, and annotated in this project; Figure 1D; Supplemental Tables 2 and 3). Similarly, the data for transcriptome components consist of an introduction, downloadable data, and BLAST. In the introduction, as before, we describe the basic areal distribution and pharmacological function of herbal plants. In the downloadable data, the published year, institution, sample information, sequencing platform, data size, assembly results, and annotation methods used in the projects are summarized. For the published transcriptomic data, the SRA data uploaded on NCBI have been linked in HMOD, and for de novo assembled data in this project, fasta formatted files for unigenes, cds, and protein sequences can be downloaded from this database. The de novo assembled transcriptomes have been linked to the BLAST tool. Eighteen main plant KEGG pathways information sources and other herbal plant-related websites are linked to HMOD (Figure 1E). We start the KEGG annotation with all the KEGG Orthology (KP) identifiers being retrieved and selected for these genomes and transcriptomes. We finish it with the gene name, KO, gene ID in omics data, math score, and gene description in tables, which can be downloaded. HMOD contains a summary of the metabolomic data for 55 metabolites (Figure 1F and Supplemental Table 4). These data have been summarized into 35 plant families, and the published year, institution, sample information, and results are also included in tables, which can help researchers learn about the advanced metabolomics research. Diverse bioinformatics tools are available from within HMOD. We used the Generic Genome Browser (GBrowse), developed as part of the Generic Model Organism Database project (GMOD; http://gmod.org/wiki/GMOD), to visualize genome sequences, repeat sequences, and predicted genes. A variety of tracking features can be accessed, including protein-coding genes, non-coding genes, GC content, and repetitive sequences (Figure 1G). BLAST is a useful tool that offers users the ability to search against scaffolds and genes in the herbal plant genomes and transcriptomes. On the results page for a BLAST search, each hit can be downloaded to view the sequence (Figure 1H). The search function provides a tool for finding omics information using the Latin names of target plants as keywords. In summary, HMOD provides a comprehensive set of omics data and KEGG pathway information for herbal medicine plants. HMOD will be updated regularly with new datasets being added and further improved with enhanced functionality in the future to provide a more valuable resource for facilitating comparative genomics, transcriptomes, and synthetic biology studies. The project was supported by research funds from National Natural Science Foundation of China (no. U1402262), major Science and Technique Programs in Yunnan Province (no. 2016ZF001) and the Project of Young and Middle-aged Talent of Yunnan Province (Grant No. 2014HB011).