SaprotHub: Making Protein Modeling Accessible to All Biologists

Jin Su,Zhikai Li,Chenchen Han,Yuyang Zhou,Yan He,Junjie Shan,Xibin Zhou,Xing Chang,Dacheng Ma,The OPMC,Martin Steinegger,Sergey Ovchinnikov,Fajie Yuan
DOI: https://doi.org/10.1101/2024.05.24.595648
2024-09-11
Abstract:Training and deploying deep learning models pose challenges for users without machine learning (ML) expertise. SaprotHub offers a user-friendly platform that democratizes the process of training, utilizing, storing, and sharing protein ML models, fostering collaboration within the biology community—all achievable with just a few clicks, regardless of ML background. At its core, Saprot is an advanced, foundational protein language model. Through its ColabSaprot framework, it supports potentially hundreds of protein training and prediction applications, enabling the co-construction and co-sharing of these trained models. This enhances user engagement and drives community-wide innovation.
Bioinformatics
What problem does this paper attempt to address?
The problem this paper attempts to address is: The training and deployment of current protein machine learning (ML) models present numerous challenges for researchers without ML expertise. These challenges specifically include selecting the appropriate model architecture, managing complex programming details, preprocessing large-scale datasets, training model parameters, and evaluating and interpreting results. These complexities often hinder researchers without an ML background from actively participating in this field, especially as AI models become increasingly complex. To address these issues, the paper proposes SaprotHub, a user-friendly platform based on Google Colaboratory, designed to simplify the training, utilization, storage, and sharing of protein machine learning models without requiring advanced ML or programming knowledge. At the core of SaprotHub is an advanced protein language model, Saprot, which supports various protein training and prediction applications through its ColabSaprot framework, thereby enhancing user engagement and driving innovation within the broader biological research community. Additionally, the paper introduces the Open Protein Modeling Consortium (OPMC), which aims to create a decentralized unified repository of protein prediction models to facilitate model sharing and collaboration among researchers.