Learning the Language of Antibody Hypervariability

Rohit Singh,Chiho Im,Yu Qiu,Brian Mackness,Abhinav Gupta,Taylor Sorenson,Samuel Sledzieski,Lena Erlach,Maria Wendt,Yves Nanfack,Bryan D Bryson,Bonnie Berger

DOI: https://doi.org/10.1101/2023.04.26.538476

2024-05-29

Abstract:Protein language models (PLMs) based on machine learning have demonstrated impressive success in predicting protein structure and function. However, general-purpose ("foundational") PLMs have limited performance in predicting antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that the models rely on. In this study, we propose a new transfer learning framework called AbMAP, which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples. Our feature representations accurately predict an antibody's 3D structure, mutational effects on antigen binding, and paratope identification. AbMAP's scalability paves the way for large-scale analyses of human antibody repertoires. AbMAP representations of repertoires reveal a remarkable overlap across individuals, transcending the limits of sequence analyses. Our findings provide compelling evidence for the hypothesis that antibody repertoires of individuals tend to converge towards comparable structural and functional coverage. We validate AbMAP for antibody optimization, applying it to optimize a set of antibodies that bind to a SARS-CoV-2 peptide and obtaining 82% hit-rate and upto 22-fold increase in binding affinity. We anticipate AbMAP will accelerate the efficient design and modeling of antibodies and expedite the discovery of antibody-based therapeutics.

Bioinformatics

What problem does this paper attempt to address?

This paper focuses on the issue of antibody hypervariability in protein language models (PLMs), which is a challenge in predicting antibodies. The hypervariable regions of antibodies, such as the complementarity determining regions (CDRs), do not follow the general principles of evolution, leading to poor performance of existing general PLMs in predicting antibody structures and functions. The paper proposes a new transfer learning framework called AbMAP (Antibody Mutagenesis-Augmented Processing) to fine-tune the base PLMs through supervised learning methods, in order to adapt to antibody sequence inputs and utilize antibody structure and binding specificity examples for training. The characteristics of AbMAP include: 1. Accurately predicting the three-dimensional structure of antibodies, the impact of mutations on antigen binding, and identifying epitopes (paratopes). 2. Scalability, suitable for large-scale analysis of human antibody libraries, revealing significant overlaps between immune libraries of different individuals, overcoming the limitations of sequence analysis. 3. Providing evidence that antibody libraries of different individuals tend to converge in terms of structural and functional coverage. The paper also demonstrates the effectiveness of AbMAP in optimizing antibodies targeting SARS-CoV-2 peptides, improving binding affinity. Additionally, AbMAP can be used for antibody design, predicting variant effects, and outperforms other existing methods such as AlphaFold 2 and the specific antibody structure prediction tool, DeepAb, in predicting antibody structures and functions. In summary, the paper aims to address how to more accurately predict and understand the hypervariable regions of antibodies, thereby accelerating antibody design and optimization and facilitating the discovery of antibody therapies.

Learning the Language of Antibody Hypervariability

A Large Language Model Guides the Affinity Maturation of Variant Antibodies Generated by Combinatorial Optimization

Novel antibody language model accelerates IgG screening and design for broad-spectrum antiviral therapy

Addressing the antibody germline bias and its effect on language models for improved antibody design

Antibody Representation Learning for Drug Discovery

Protein language models enable prediction of polyreactivity of monospecific, bispecific, and heavy-chain-only antibodies

Reprogramming Pretrained Language Models for Antibody Sequence Infilling

Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space

Learning the heterogeneous hypermutation landscape of immunoglobulins from high-throughput repertoire data

Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model

OptMAVEn--a New Framework for the De Novo Design of Antibody Variable Region Models Targeting Specific Antigen Epitopes.

S^2ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning

AlphaBind, a Domain-Specific Model to Predict and Optimize Antibody-Antigen Binding Affinity

Learning immune receptor representations with protein language models

On Pre-trained Language Models for Antibody

Large scale paired antibody language models

Machine learning optimization of candidate antibody yields highly diverse sub-nanomolar affinity antibody libraries

AbGPT: De Novo Antibody Design via Generative Language Modeling