Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties

Srivathsan Badrinarayanan,Chakradhar Guntuboina,Parisa Mollaei,Amir Barati Farimani
2024-07-03
Abstract:Peptides are essential in biological processes and therapeutics. In this study, we introduce Multi-Peptide, an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties. We combine PeptideBERT, a transformer model tailored for peptide property prediction, with a GNN encoder to capture both sequence-based and structural features. By employing Contrastive Language-Image Pre-training (CLIP), Multi-Peptide aligns embeddings from both modalities into a shared latent space, thereby enhancing the model's predictive accuracy. Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction. This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
Quantitative Methods,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of utilizing multimodal information in peptide property prediction to improve prediction accuracy. Specifically, the paper introduces a method called Multi-Peptide, which combines the Transformer-based language model PeptideBERT with Graph Neural Networks (GNN). This method aims to enhance the accuracy of peptide property prediction by capturing both sequence and structural features of peptides. By using Contrastive Language-Image Pretraining (CLIP) technology to align the embeddings of the two modalities into a shared latent space, the model's predictive capability is improved. Experimental results show that Multi-Peptide achieved state-of-the-art accuracy (86.185%) on the hemolysis prediction task, demonstrating the potential of multimodal learning in bioinformatics and its accuracy and reliability in peptide-based research and applications. Although its performance on non-adhesive datasets is slightly lower than the sequence-based PeptideBERT model, Multi-Peptide demonstrates the ability to integrate sequence and structural data, providing a foundation for further optimization.