Abstract:Abstract Introduction Conventional library-based antibody display can only explore a small fraction of the sequences generated from animal immunization, not even to exhaust the potential sequence diversity that can be turned into antibody therapies. This is because screening for antibody is limited to sequences that can be displayed, which only constitute a subset of the entire sequences generated by B cells, whereas screening for antibody directly from single B cells can be costly. Here, we introduce a novel Artificial Intelligence-enabling tool to navigate antibody discovery from a broader range of search space with reduced cost. We trained a transformer-based model from sequences of an immunized library to cluster the clones and a generative adversarial network (GAN)-based model to generate novel sequences that can be potentially developed into antibody therapies. Background and significance One limitation in the early discovery of antibody is the number of functional candidates that can be selected. Our work provides an AI-enabling tool to discover and generate a panel of antibodies of differentiated binding strengths to a broad range of epitopes to ensure functional coverage. Methods & Results We extracted 104 sequences from the FACS-enriched yeast pool from a fully immunized alpaca (Lama pacos) using Next Generation Sequencing, from which we assembled 103 unique sdAb sequences. We fine-tuned a transformer-based deep learning model, which was previously trained from our dataset containing 100,000 antibody sequences, on such pre-processed sdAb sequences giving representation that correlates to the sequence homology for the clustering of clonal types. We postulate such representation also encodes long-range amino acid interactions in the 3D structure, making the accuracy exceeds the performance of bioinformatics-based primary sequence homology analysis. This process is fully automated and optimized to require minima computational resources. We selected 15 candidates from AI-clustered clonal groups and experimentally measured their binding activity. Kd of 12 candidates were of 10−9 affinity and 1 candidates were of 10−8 affinity, the rest one candidate was non-binding (hence a hit rate of 87%). The large sequence diversity of the CDR3 show these nanobodies are potentially good binders for a wide range of epitopes. We generated a CDR-diversifying virtual library (103) of each binding candidate by training a GAN-based models using the sequences of the same clonal group of the binder sequences. This method incorporates the probability of amino acid residues on each specific location that provides a more precise mutagenesis route than PCR-based affinity maturation. The generated sequences provided a wider CDR sequence diversity for the selection of antibodies of differentiated affinity and epitopes, which could generate candidates of different functionality. Conclusion Antibody discovery is a central step in early drug development that identification of a wide range of functional candidates could increase the success rate and reduce risks in later developments. We built an AI-enabling tool for the searching and generation of functional antibodies from animal immunization library. We believe this technology would help deliver candidates of fine-tuned affinity and functionality.

AVIDa-hIL6: A Large-Scale VHH Dataset Produced from an Immunized Alpaca for Predicting Antigen-Antibody Interactions

A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models

Prediction of antigen-responding VHH antibodies by tracking the evolution of antibody along the time course of immunization

Development, High-Throughput Profiling, and Biopanning of a Large Phage Display Single-Domain Antibody Library

Development of novel humanized VHH synthetic libraries based on physicochemical analyses

Machine learning prediction of Antibody-Antigen binding: dataset, method and testing

Novel antibody language model accelerates IgG screening and design for broad-spectrum antiviral therapy

A Novel Polyclonal Antibody Against Human Cytomegalovirus: General Characteristics and Potential Application in Diagnosis

A Benchmark Dataset of Protein Antigens for Antigenicity Measurement

Distinct types of VHHs in Alpaca

Highland games: A benchmarking exercise in predicting biophysical and drug properties of monoclonal antibodies from amino acid sequences

Learning the heterogeneous hypermutation landscape of immunoglobulins from high-throughput repertoire data

GENERATION OF NOVEL ANTIBODY CANDIDATES USING TRANSFORMER AND GAN-BASED DEEP LEARNING ARTIFICIAL INTELLIGENCE

Using Interpretable Machine Learning to Massively Increase the Number of Antibody-Virus Interactions Across Studies

Large-scale data mining of four billion human antibody variable regions reveals convergence between therapeutic and natural antibodies that constrains search space for biologics drug discovery

AbAgIntPre: A Deep Learning Method for Predicting Antibody-Antigen Interactions Based on Sequence Information

Viral Immunogenicity Prediction by Machine Learning Methods

Machine learning application to predict binding affinity between peptide containing non-canonical amino acids and HLA0201

Protein language models enable prediction of polyreactivity of monospecific, bispecific, and heavy-chain-only antibodies

On the humanization of VHHs: Prospective case studies, experimental and computational characterization of structural determinants for functionality

Humatch - fast, gene-specific joint humanisation of antibody heavy and light chains