Systematic Analysis of Missing Proteins Provides Clues to Help Define All of the Protein-Coding Genes on Human Chromosome 1.

Chengpu Zhang,Ning Li,Linhui Zhai,Shaohang Xu,Xiaohui Liu,Yizhi Cui,Jie Ma,Mingfei Han,Jing Jiang,Chunyuan Yang,Fengxu Fan,Liwei Li,Peibin Qin,Qing Yu,Cheng Chang,Na Su,Junjie Zheng,Tao Zhang,Bo Wen,Ruo Zhou,Liang Lin,Zhilong Lin,Baojin Zhou,Yang Zhang,Guoquan Yan,Yinkun Liu,Pengyuan Yang,Kun Guo,Wei Gu,Yang Chen,Gong Zhang,Qing-Yu He,Songfeng Wu,Tong Wang,Huali Shen,Quanhui Wang,Yunping Zhu,Fuchu He,Ping Xu
DOI: https://doi.org/10.1021/pr400900j
2013-01-01
Journal of Proteome Research
Abstract:Our first proteomic exploration of human chromosome 1 began in 2012 (CCPD 1.0), and the genome-wide characterization of the human proteome through public resources revealed that 32-39% of proteins on chromosome 1 remain unidentified. To characterize all of the missing proteins, we applied an OMICS-integrated analysis of three human liver cell lines (Hep3B, MHCC97H, and HCCLM3) using mRNA and ribosome nascent-chain complex-bound mRNA deep sequencing and proteome profiling, contributing mass spectrometric evidence of 60 additional chromosome 1 gene products. Integration of the annotation information from public databases revealed that 84.6% of genes on chromosome 1 had high-confidence protein evidence. Hierarchical analysis demonstrated that the remaining 320 missing genes were either experimentally or biologically explainable; 128 genes were found to be tissue-specific or rarely expressed in some tissues, whereas 91 proteins were uncharacterized mainly due to database annotation diversity, 89 were genes with low mRNA abundance or unsuitable protein properties, and 12 genes were identifiable theoretically because of a high abundance of mRNAs/RNC-mRNAs and the existence of proteotypic peptides. The relatively large contribution made by the identification of enriched transcription factors suggested specific enrichment of low-abundance protein classes, and SRM/MRM could capture high-priority missing proteins. Detailed analyses of the differentially expressed genes indicated that several gene families located on chromosome 1 may play critical roles in mediating hepatocellular carcinoma invasion and metastasis. All mass spectrometry proteomics data corresponding to our study were deposited in the ProteomeXchange under the identifiers PXD000529, PXD000533, and PXD000535.
What problem does this paper attempt to address?