Assessment of the Limits of Predictability of Protein and Phosphorylation Levels in Cancer

Mi Yang,F. Petralia,Zhi Li,Hongyang Li,Weiping Ma,Xiaoyu Song,Sunkyu Kim,Heewon Lee,Han Yu,Bora Lee,Seohui Bae,Eunji Heo,Jan Kaczmarczyk,P. Stepniak,M. Warchoł,Thomas Yu,A. Calinawan,P. Boutros,S. Payne,B. Reva,Sandeep Kumar Dhanda,Emily Boja,H. Rodriguez,G. Stolovitzky,Y. Guan,Jaewoo Kang,Pei Wang,D. Fenyö,J. Saez-Rodriguez
DOI: https://doi.org/10.2139/ssrn.3554086
2020-03-30
Abstract:Even though cancer is driven by genomic alterations, the chain functions causing this disease are largely carried out by proteins. Proteins are also typically targeted in treatment. However, proteomes are harder and more expensive to measure than genomes and transcriptomes. Thus, it would be very valuable to accurately estimate protein levels using other omics data. To catalise developments of solutions to this problem, and to answer fundamental questions about transcriptional and translational control, we leveraged the power of crowdsourcing via a collaborative competition: The NCI-CPTAC DREAM Proteogenomics Challenge. The best performance for predicting protein and phosphorylation levels was achieved by an ensemble of models including as predictors transcript level of the corresponding genes, interaction between genes, conservation across tumor types and, for phosphorylation prediction, phosphosite proximity. Proteins from metabolic pathways were the best predicted, whereas complex proteins were the least well predicted. However, the performance even of the best performing model was modest, suggesting that the level for many proteins are strongly regulated through translational control and degradation. From the best-performing model, we identified common predictors, which are predictive of survival outcome. Our results shed light on the potential application of computational models to large scale proteogenomic characterization of cancer in order to better understand signaling dysregulation mechanisms in the disease.
Biology
What problem does this paper attempt to address?