Abstract:Therapeutic antibodies have become one of the most influential therapeutics in modern medicine to fight against infectious pathogens, cancer, and many other diseases. However, experimental screening for highly efficacious targeting antibodies is labor-intensive and of high cost, which is exacerbated by evolving antigen targets under selective pressure such as fast-mutating viral variants. As a proof-of-concept, we developed a machine learning-assisted antibody generation pipeline AbGen that greatly accelerates the screening and re-design of immunoglobulins G (IgGs) against a broad spectrum of SARS-CoV-2 coronavirus variant strains. Our AbGen centers around a novel antibody language model (AbLM) that is pretrained on 12 million generic protein domain sequences and fine-tuned on 4,000+ paired VH-VL sequences, with IgG-specific CDR-masking and VH-VL cross-attention. AbLM provides a latent space of IgG sequence embeddings for AbGen, including (a) landscapes of IgGs' activities in neutralizing the wild-type virus are analyzed through structure prediction for IgG and IgG-antigen (viral protein spike's receptor binding domain, RBD) interactions; and (b) landscapes of IgGs' susceptibility in neutralizing variant viruses are predicted through Gaussian process regression, despite that as few as 14 clinical antibodies' responses to variants of concern are available. The AbGen pipeline was applied to over 1300 IgG sequences we collected from RBD-binding B cells of convalescent patients. With experimental validations, AbGen efficiently prioritized IgG candidates against a broad spectrum of viral variants (wildtype, Delta, and Omicron), preventing the infection of host cells in vitro and hACE2 transgenic mice in vivo. Compared to other existing protein language models that require 10-100 times more model parameters, AbLM improved the precision from around 50% to 75% to predict IgGs with low variant susceptibility. Furthermore, AbGen enables structure-based computational protein redesign for selected IgG clones with single amino acid substitutions at the RBD-binding interface that doubled the IgG blockade efficacy for one of the severe, therapy-resistant strains - Delta (B.1.617). Our work expedites applications of artificial intelligence in antibody screen and re-design combining data-driven protein language models and Kriging for antibody sequence analysis and activity prediction, in synergy with physics-driven protein docking and design for antibody-antigen interface analyses and functional optimization.

An explainable language model for antibody specificity prediction using curated influenza hemagglutinin antibodies

Supervised fine-tuning of pre-trained antibody language models improves antigen specificity prediction

Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model

On Pre-trained Language Models for Antibody

Large-Scale Analysis of B-Cell Epitopes on Influenza Virus Hemagglutinin - Implications for Cross-Reactivity of Neutralizing Antibodies

Pre-training Antibody Language Models for Antigen-Specific Computational Antibody Design

ImmunoLingo: Linguistics-based formalization of the antibody language

Identification and Analysis of B Cell Epitopes of Hemagglutinin of H1N1 Influenza Virus.

Learning the Language of Antibody Hypervariability

Novel antibody language model accelerates IgG screening and design for broad-spectrum antiviral therapy

DPCIPI: A pre-trained deep learning model for predicting cross-immunity between drifted strains of Influenza A/H3N2

IgGM: A Generative Model for Functional Antibody and Nanobody Design

S^2ALM: Sequence-Structure Pre-trained Large Language Model for Comprehensive Antibody Representation Learning

Protein language models enable prediction of polyreactivity of monospecific, bispecific, and heavy-chain-only antibodies

Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design

The Surgical Importance of a Persistent Left Superior Vena Cava*

AntiFormer: graph enhanced large language model for binding affinity prediction

Ab-Amy 2.0: Predicting light chain amyloidogenic risk of therapeutic antibodies based on antibody language model

Accurate prediction of antibody deamidations by combining high-throughput automated peptide mapping and protein language model-based deep learning