Panax ginseng genome examination for ginsenoside biosynthesis
Jiang Xu,Yang Chu,Baosheng Liao,Shuiming Xiao,Qinggang Yin,Rui Bai,He Su,Linlin Dong,Xiwen Li,Jun Qian,Jingjing Zhang,Yujun Zhang,Xiaoyan Zhang,Mingli Wu,Jie Zhang,Guozheng Li,Lei Zhang,Zhenzhan Chang,Yuebin Zhang,Zhengwei Jia,Zhixiang Liu,Daniel Afreh,Ruth Nahurira,Lianjuan Zhang,Ruiyang Cheng,Yingjie Zhu,Guangwei Zhu,Wei Rao,Chao Zhou,Lirui Qiao,Zhihai Huang,Yung-Chi Cheng,Shilin Chen
DOI: https://doi.org/10.1093/gigascience/gix093
IF: 7.658
2017-11-01
GigaScience
Abstract:Ginseng, which contains ginsenosides as bioactive compounds, has been regarded as an important traditional medicine for several millennia. However, the genetic background of ginseng remains poorly understood, partly because of the plant's large and complex genome composition. We report the entire genome sequence of Panax ginseng using next-generation sequencing. The 3.5-Gb nucleotide sequence contains more than 60% repeats and encodes 42 006 predicted genes. Twenty-two transcriptome datasets and mass spectrometry images of ginseng roots were adopted to precisely quantify the functional genes. Thirty-one genes were identified to be involved in the mevalonic acid pathway. Eight of these genes were annotated as 3-hydroxy-3-methylglutaryl-CoA reductases, which displayed diverse structures and expression characteristics. A total of 225 UDP-glycosyltransferases (UGTs) were identified, and these UGTs accounted for one of the largest gene families of ginseng. Tandem repeats contributed to the duplication and divergence of UGTs. Molecular modeling of UGTs in the 71st, 74th, and 94th families revealed a regiospecific conserved motif located at the N-terminus. Molecular docking predicted that this motif captures ginsenoside precursors. The ginseng genome represents a valuable resource for understanding and improving the breeding, cultivation, and synthesis biology of this key herb.