Open-pFind Verified Four Missing Proteins from Multi-Tissues
Shujia Wu,Jinshuai Sun,Xi Wang,Feng Xu,Hao Chi,Yanchang Li,Bowen Zhong,Yuping Xie,Zhonghua Yan,Lei Chang,Dongxue Wang,Fuchu He,Junzhu Wu,Yao Zhang,Ping Xu
DOI: https://doi.org/10.1021/acs.jproteome.0c00370
2020-01-01
Journal of Proteome Research
Abstract:The Chromosome-Centric Human Proteome Project (C-HPP) was launched in 2012 to perfect the annotation of human protein existence by identifying stronger evidence of the expression of missing proteins (MPs) at the protein level. After an 8 year effort all over the world, the number of MPs in the neXtProt database significantly decreased from 5511 (2012-02-24) to 1899 (2020-01-17). It is now more difficult to provide confident evidence of the remaining MPs because of their specific characteristics, including low abundance, low molecular weight, unexpected modifications, transmembrane structure, tissue-expression specificity, and so on. A higher resolution mass spectrometry (MS) interpretation engine might provide an opportunity to identify these buried MPs in complex samples by the combination with multi-tissue large-scale proteomics. In this study, open-pFind was used to dig MPs from 20 pairs of healthy human tissues by Wang et al. ( Mol. Syst. Biol. 2019, 15 (2), e8503) combined with our large-scale testis data set digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity for different amino acid residues ( J. Proteme Res. 2019, 18 (12), 4189-4196). A total of 1 535 536 peptides with 17 283 477 peptide-spectrum matches (PSMs) were mapped to 14 279 protein entries at a false discovery rate of <1% at the PSM, peptide, and protein levels. A total of 103 MP candidates were identified, among which 86 candidates had more unique peptide numbers compared with our single testis tissue. After rigorous screening, manual checks, peptide synthesis, and matching with documented peptides from PeptideAtlas, we validated four MPs, P0C7T8 (duodenum and small intestine), Q8WWZ4 (stomach and rectum), Q8IV35 (fallopian tube), and O14921 (tonsil), at the protein level. All MS raw files have been deposited to the ProteomeXchange with identifier PXD021391.
What problem does this paper attempt to address?