Abstract:Abstract Introduction Conventional library-based antibody display can only explore a small fraction of the sequences generated from animal immunization, not even to exhaust the potential sequence diversity that can be turned into antibody therapies. This is because screening for antibody is limited to sequences that can be displayed, which only constitute a subset of the entire sequences generated by B cells, whereas screening for antibody directly from single B cells can be costly. Here, we introduce a novel Artificial Intelligence-enabling tool to navigate antibody discovery from a broader range of search space with reduced cost. We trained a transformer-based model from sequences of an immunized library to cluster the clones and a generative adversarial network (GAN)-based model to generate novel sequences that can be potentially developed into antibody therapies. Background and significance One limitation in the early discovery of antibody is the number of functional candidates that can be selected. Our work provides an AI-enabling tool to discover and generate a panel of antibodies of differentiated binding strengths to a broad range of epitopes to ensure functional coverage. Methods & Results We extracted 104 sequences from the FACS-enriched yeast pool from a fully immunized alpaca (Lama pacos) using Next Generation Sequencing, from which we assembled 103 unique sdAb sequences. We fine-tuned a transformer-based deep learning model, which was previously trained from our dataset containing 100,000 antibody sequences, on such pre-processed sdAb sequences giving representation that correlates to the sequence homology for the clustering of clonal types. We postulate such representation also encodes long-range amino acid interactions in the 3D structure, making the accuracy exceeds the performance of bioinformatics-based primary sequence homology analysis. This process is fully automated and optimized to require minima computational resources. We selected 15 candidates from AI-clustered clonal groups and experimentally measured their binding activity. Kd of 12 candidates were of 10−9 affinity and 1 candidates were of 10−8 affinity, the rest one candidate was non-binding (hence a hit rate of 87%). The large sequence diversity of the CDR3 show these nanobodies are potentially good binders for a wide range of epitopes. We generated a CDR-diversifying virtual library (103) of each binding candidate by training a GAN-based models using the sequences of the same clonal group of the binder sequences. This method incorporates the probability of amino acid residues on each specific location that provides a more precise mutagenesis route than PCR-based affinity maturation. The generated sequences provided a wider CDR sequence diversity for the selection of antibodies of differentiated affinity and epitopes, which could generate candidates of different functionality. Conclusion Antibody discovery is a central step in early drug development that identification of a wide range of functional candidates could increase the success rate and reduce risks in later developments. We built an AI-enabling tool for the searching and generation of functional antibodies from animal immunization library. We believe this technology would help deliver candidates of fine-tuned affinity and functionality.

For antibody sequence generative modeling, mixture models may be all you need

For antibody sequence generative modeling, mixture models may be all you need

Antibody-SGM, a Score-Based Generative Model for Antibody Heavy-Chain Design

Generative Antibody Design for Complementary Chain Pairing Sequences through Encoder-Decoder Language Model

Benchmarking Generative Models for Antibody Design & Exploring Log-Likelihood for Sequence Ranking

Generative Humanization for Therapeutic Antibodies

IgGM: A Generative Model for Functional Antibody and Nanobody Design

A comprehensive overview of recent advances in generative models for antibodies

GENERATION OF NOVEL ANTIBODY CANDIDATES USING TRANSFORMER AND GAN-BASED DEEP LEARNING ARTIFICIAL INTELLIGENCE

A generative foundation model for antibody sequence understanding

Energy-based generative models for monoclonal antibodies

AbGPT: De Novo Antibody Design via Generative Language Modeling

AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning

Humatch - fast, gene-specific joint humanisation of antibody heavy and light chains

Learning the Language of Antibody Hypervariability

Protein language models enable prediction of polyreactivity of monospecific, bispecific, and heavy-chain-only antibodies

Addressing the antibody germline bias and its effect on language models for improved antibody design

BetterBodies: Reinforcement Learning guided Diffusion for Antibody Sequence Design

Large scale paired antibody language models

De novo generation of SARS-CoV-2 antibody CDRH3 with a pre-trained generative large language model