Widespread noncoded amino acids in human proteome
Jing-Hua Yang,Xinjun Chen,Jing Gong,Han Zhao,Cuiling Li,Baibin Bi,Fengqin Wang,Shengnan Sun,Xingyuan Wang,Xin Lv,Baobo Zhang,Tao Huang,Kazem M. Azadzoi,Feng Shi,Xianglong Kong,Minglei Shu,Yinglong Wang,Y. Eugene Chin,Wan Huang,Zhinan Chen,Zi-Jiang Chen
DOI: https://doi.org/10.1101/292474
2018-01-01
bioRxiv
Abstract:Proteins are usually deciphered by translation of the coding genome; however, their amino acid residues are seldom determined directly across the proteome. Herein, we describe a systematic workflow for identifying all possible protein residues that differ from the coding genome, termed noncoded amino acids (ncAAs). By measuring the mass differences between the coding amino acids and the actual protein residues in human spermatozoa, over a million nonzero delta masses were detected, fallen into 424 high-quality Gaussian clusters and 571 high-confidence ncAAs spanning 29,053 protein sites. Most ncAAs are novel with unresolved side-chains and discriminative between healthy individuals and patients with oligoasthenospermia. For validation, 40 out of 98 ncAAs that matched with amino acid substitutions were confirmed by exon sequencing. This workflow revealed the widespread existence of previously unreported ncAAs in the sperm proteome, which represents a new dimension on the understanding of amino acid polymorphisms at the proteomic level.