Improving Gene Annotation of the Peanut Genome by Integrated Proteogenomics Workflow

Haifen Li,Ruo Zhou,Shaohang Xu,Xiaoping Chen,Yanbin Hong,Qing Lu,Hao Liu,Baojin Zhou,Xuandiang Liang
DOI: https://doi.org/10.1021/acs.jproteome.9b00723
2020-01-01
Journal of Proteome Research
Abstract:Peanut (Arachis hypogaea L.) is a staple crop in semiarid tropical and subtropical regions. Although the genome of peanut has been fully sequenced, the current gene annotations are still incomplete. New technologies in genomics and proteomics have resulted in the emergence of proteogenomics, which can integrate genomic, transcriptomic, and proteomic data for improving gene annotation. In the present study, we collected RNA-seq and proteomic data from multiple tissues such as seed, shell, and gynophore of peanut and utilized a proteogenomic approach to improve the gene annotation of peanut based on these data. A total of 1 935 655 904 RNA-seq reads and 7 490 280 MS/MS spectra were collected. Ultimately, 13 767 annotated genes were found with evidence at the protein level, and seven novel protein-coding genes were found with both RNA-seq and proteomics evidence. In addition, 35 gene models were updated based on proteomics data. Proteogenomic approaches improved the gene annotation in certain aspects by integrating both RNA-seq and proteomic data. We expect that these approaches could help improve existing genome annotations of other species.
What problem does this paper attempt to address?