Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs.

Han Sun,Chen Chen,Meng Shi,Dandan Wang,Mingwei Liu,Daixi Li,Pengyuan Yang,Yixue Li,Lu Xie
DOI: https://doi.org/10.1002/pmic.201400174
2014-01-01
PROTEOMICS
Abstract:MS/MS has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole ORF of a genome and search against this database with real MS/MS spectra. In this work we took a more focused approach, we constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort. To validate proteogenomics results, we integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This workflow could be flexibly applied to other human samples and datasets to help further improve human gene annotation.
What problem does this paper attempt to address?