ProteinAligner: A Multi-modal Pretraining Framework for Protein Foundation Models

Li Zhang,Han Guo,Leah V Schaffer,Young Su Ko,Digvijay Singh,Hamid Rahmani,Danielle Grotjahn,Elizabeth Villa,Michael Gilson,Wei Wang,Trey Ideker,Eric Xing,Pengtao Xie
DOI: https://doi.org/10.1101/2024.10.06.616870
2024-10-06
Abstract:Protein foundation models, particularly protein language models, have demonstrated strong success in learning meaningful representations of proteins using transformer architectures pretrained on large-scale protein datasets with self-supervised learning. These representations have been highly effective for downstream tasks such as predicting protein functions and properties. However, most current protein foundation models focus on pretraining with amino acid sequences, often neglecting additional modalities like protein structures and related literature, both of which provide valuable insights. To address this gap, we propose a multi-modal pretraining approach that integrates three key modalities - protein sequences, structures, and literature text. In our framework, the protein sequence modality serves as the anchor, with the other two modalities aligned to it, enhancing the model's capacity to capture more comprehensive protein information. ProteinAligner outperformed state-of-the-art protein foundation models in predicting protein functions and properties across diverse downstream tasks.
Bioinformatics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is that current protein foundation models mainly rely on amino acid sequences for pre - training, while ignoring other important modal information, such as protein structures and related literature texts. These additional modalities provide rich biological insights and are crucial for a more comprehensive understanding of the functions and properties of proteins. Specifically, the paper points out: 1. **Protein structure**: It provides crucial three - dimensional information, which helps to understand how proteins fold and interact with other molecules, directly affecting their biological functions. 2. **Literature text**: It contains specific context information about protein mechanisms, behaviors, and interactions verified by experiments, which is difficult to infer solely from sequences or structures. To solve these problems, the paper proposes a multimodal pre - training framework - ProteinAligner, which integrates three key modalities: protein sequences, structures, and literature texts. By aligning these three modalities with protein sequences, this framework can learn richer and more comprehensive protein representations, thereby improving the accuracy of downstream tasks (such as predicting protein functions and properties). In summary, ProteinAligner aims to enhance the learning ability of protein foundation models by combining multiple - modal data to achieve more accurate prediction of protein functions and properties.