Community Evaluation of Glycoproteomics Informatics Solutions Reveals High-Performance Search Strategies of Serum N- and O-Glycopeptide Data
Rebeca Kawahara,Anastasia Chernykh,Kathirvel Alagesan,Marshall Bern,Weiqian Cao,Robert J. Chalkley,Kai Cheng,Matthew S. Choo,Nathan Edwards,Radoslav Goldman,Marcus Hoffmann,Yingwei Hu,Yifan Huang,Jin Young Kim,Doron Kletter,Benoit Liquet-Weiland,Mingqi Liu,Yehia Mechref,Bo Meng,Sriram Neelamegham,Terry Nguyen-Khuong,Jonas Nilsson,Adam Pap,Gun Wook Park,Benjamin L. Parker,Cassandra L. Pegg,Josef M. Penninger,Toan K. Phung,Markus Pioch,Erdmann Rapp,Enes Sakalli,Miloslav Sanda,Benjamin L. Schulz,Nichollas E. Scott,Georgy Sofronov,Johannes Stadlmann,Sergey Y. Vakhrushev,Christina M. Woo,Hung-Yi Wu,Pengyuan Yang,Wantao Ying,Hui Zhang,Yong Zhang,Jingfu Zhao,Joseph Zaia,Stuart M. Haslam,Giuseppe Palmisano,Jong Shin Yoo,Göran Larson,Kai-Hooi Khoo,Katalin F. Medzihradszky,Daniel Kolarich,Nicolle H. Packer,Morten Thaysen-Andersen
DOI: https://doi.org/10.1101/2021.03.14.435332
2021-01-01
Abstract:Glycoproteome profiling (glycoproteomics) is a powerful yet analytically challenging research tool. The complex tandem mass spectra generated from glycopeptide mixtures require sophisticated analysis pipelines for structural determination. Diverse software aiding the process have appeared, but their relative performance remains untested. Conducted through the HUPO Human Proteome Project – Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates the performance of informatics solutions for system-wide glycopeptide analysis. Mass spectrometry-based glycoproteomics datasets from human serum were shared with all teams. The relative team performance for N- and O-glycopeptide data analysis was comprehensively established and validated through orthogonal performance tests. Excitingly, several high-performance glycoproteomics informatics solutions were identified. While the study illustrated that significant informatics challenges remain, as indicated by a high discordance between annotated glycopeptides, lists of high-confidence (consensus) glycopeptides were compiled from the standardised team reports. Deep analysis of the performance data revealed key performance-associated search variables and led to recommendations for improved “high coverage” and “high accuracy” glycoproteomics search strategies. This study concludes that diverse software for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies, and specifies key variables that may guide future software developments and assist informatics decision-making in glycoproteomics.