GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Minghao Xu,Yunteng Geng,Yihang Zhang,Ling Yang,Jian Tang,Wentao Zhang
2024-10-01
Abstract:Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan property and function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Also, we evaluate how taxonomy prediction can boost other three function prediction tasks by MTL. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance. We provide all datasets and source codes at <a class="link-external link-https" href="https://github.com/GlycanML/GlycanML" rel="external noopener nofollow">this https URL</a> and maintain a leaderboard at <a class="link-external link-https" href="https://GlycanML.github.io/project" rel="external noopener nofollow">this https URL</a>
Machine Learning
What problem does this paper attempt to address?