Protein-Mamba: Biological Mamba Models for Protein Function Prediction

Bohao Xu,Yingzhou Lu,Yoshitaka Inoue,Namkyeong Lee,Tianfan Fu,Jintai Chen
2024-09-23
Abstract:Protein function prediction is a pivotal task in drug discovery, significantly impacting the development of effective and safe therapeutics. Traditional machine learning models often struggle with the complexity and variability inherent in predicting protein functions, necessitating more sophisticated approaches. In this work, we introduce Protein-Mamba, a novel two-stage model that leverages both self-supervised learning and fine-tuning to improve protein function prediction. The pre-training stage allows the model to capture general chemical structures and relationships from large, unlabeled datasets, while the fine-tuning stage refines these insights using specific labeled datasets, resulting in superior prediction performance. Our extensive experiments demonstrate that Protein-Mamba achieves competitive performance, compared with a couple of state-of-the-art methods across a range of protein function datasets. This model's ability to effectively utilize both unlabeled and labeled data highlights the potential of self-supervised learning in advancing protein function prediction and offers a promising direction for future research in drug discovery.
Machine Learning,Biomolecules,Quantitative Methods
What problem does this paper attempt to address?
The paper aims to address the problem of protein function prediction. Specifically, protein function prediction plays a crucial role in drug discovery, significantly enhancing the efficiency of drug development by reducing time and cost through early identification of potential drug candidates. However, traditional machine learning models often perform poorly when dealing with the complexity and diversity of protein function prediction. To solve this problem, the paper proposes the Protein-Mamba model, a two-stage model that combines pre-training with self-supervised learning and fine-tuning. The pre-training stage leverages a large amount of unlabeled data (such as amino acid sequences) to learn the basic chemical structure and relationships of proteins; in the fine-tuning stage, a specific small-scale labeled dataset is used to further optimize these features, thereby improving prediction performance. Experimental results show that the Protein-Mamba model outperforms several existing advanced methods on various protein function datasets, demonstrating the potential of self-supervised learning in the field of protein function prediction and providing new directions for future drug discovery research.