Enhancing Antibody Language Models with Structural Information

Justin Barton,Jacob D. Galson,Jinwoo Leem
DOI: https://doi.org/10.1101/2023.12.12.569610
2024-01-04
Abstract:The central tenet of molecular biology is that a protein’s amino acid sequence determines its three-dimensional structure, and thus its function. However, proteins with similar sequences do not always fold into the same shape, and vice-versa, dissimilar sequences can adopt similar folds. In this work, we explore antibodies, a class of proteins in the immune system, whose local shapes are highly unpredictable, even with small variations in their sequence. Inspired by the CLIP method [ ], we propose a multimodal contrastive learning approach, contrastive sequence-structure pre-training (CSSP), which amalgamates the representations of antibody sequences and structures in a mutual latent space. Integrating structural information leads both antibody and protein language models to show better correspondence with structural similarity and improves accuracy and data efficiency in downstream binding prediction tasks. We provide an optimised CSSP-trained model, AntiBERTa2-CSSP, for non-commercial use at .
Bioinformatics
What problem does this paper attempt to address?
The paper aims to address the challenges faced by antibody language models in handling the relationship between antibody sequences and structures. Specifically: 1. **Enhancing the structural awareness of antibody language models**: By incorporating structural information, the model's understanding of structural similarity is improved, enabling it to better capture the complex relationship between antibody sequences and their three-dimensional structures. 2. **Improving antigen-binding prediction performance**: By using the Contrastive Sequence-Structure Pre-training (CSSP) method, the model's accuracy and data efficiency in predicting antibody-antigen binding with limited experimental data are enhanced. 3. **Increasing the model's generalization ability**: Even with limited data, pre-training with structural information can improve the model's performance and reduce the need for experimental data, thereby accelerating the progress of antibody engineering research.