Learning the Language of Antibody Hypervariability

Rohit Singh,Chiho Im,Yu Qiu,Brian Mackness,Abhinav Gupta,Taylor Sorenson,Samuel Sledzieski,Lena Erlach,Maria Wendt,Yves Nanfack,Bryan D Bryson,Bonnie Berger
DOI: https://doi.org/10.1101/2023.04.26.538476
2024-05-29
Abstract:Protein language models (PLMs) based on machine learning have demonstrated impressive success in predicting protein structure and function. However, general-purpose ("foundational") PLMs have limited performance in predicting antibodies due to the latter's hypervariable regions, which do not conform to the evolutionary conservation principles that the models rely on. In this study, we propose a new transfer learning framework called AbMAP, which fine-tunes foundational models for antibody-sequence inputs by supervising on antibody structure and binding specificity examples. Our feature representations accurately predict an antibody's 3D structure, mutational effects on antigen binding, and paratope identification. AbMAP's scalability paves the way for large-scale analyses of human antibody repertoires. AbMAP representations of repertoires reveal a remarkable overlap across individuals, transcending the limits of sequence analyses. Our findings provide compelling evidence for the hypothesis that antibody repertoires of individuals tend to converge towards comparable structural and functional coverage. We validate AbMAP for antibody optimization, applying it to optimize a set of antibodies that bind to a SARS-CoV-2 peptide and obtaining 82% hit-rate and upto 22-fold increase in binding affinity. We anticipate AbMAP will accelerate the efficient design and modeling of antibodies and expedite the discovery of antibody-based therapeutics.
Bioinformatics
What problem does this paper attempt to address?
This paper focuses on the issue of antibody hypervariability in protein language models (PLMs), which is a challenge in predicting antibodies. The hypervariable regions of antibodies, such as the complementarity determining regions (CDRs), do not follow the general principles of evolution, leading to poor performance of existing general PLMs in predicting antibody structures and functions. The paper proposes a new transfer learning framework called AbMAP (Antibody Mutagenesis-Augmented Processing) to fine-tune the base PLMs through supervised learning methods, in order to adapt to antibody sequence inputs and utilize antibody structure and binding specificity examples for training. The characteristics of AbMAP include: 1. Accurately predicting the three-dimensional structure of antibodies, the impact of mutations on antigen binding, and identifying epitopes (paratopes). 2. Scalability, suitable for large-scale analysis of human antibody libraries, revealing significant overlaps between immune libraries of different individuals, overcoming the limitations of sequence analysis. 3. Providing evidence that antibody libraries of different individuals tend to converge in terms of structural and functional coverage. The paper also demonstrates the effectiveness of AbMAP in optimizing antibodies targeting SARS-CoV-2 peptides, improving binding affinity. Additionally, AbMAP can be used for antibody design, predicting variant effects, and outperforms other existing methods such as AlphaFold 2 and the specific antibody structure prediction tool, DeepAb, in predicting antibody structures and functions. In summary, the paper aims to address how to more accurately predict and understand the hypervariable regions of antibodies, thereby accelerating antibody design and optimization and facilitating the discovery of antibody therapies.