Abstract:Exquisite binding specificity is essential for many protein functions but is difficult to engineer. Many biotechnological or biomedical applications require the discrimination of very similar ligands, which poses the challenge of designing protein sequences with highly specific binding profiles. Experimental methods for generating specific binders rely on in vitro selection, which is limited in terms of library size and control over specificity profiles. Additional control was recently demonstrated through high-throughput sequencing and downstream computational analysis. Here we follow such an approach to demonstrate the design of specific antibodies beyond those probed experimentally. We do so in a context where very similar epitopes need to be discriminated, and where these epitopes cannot be experimentally dissociated from other epitopes present in the selection. Our approach involves the identification of different binding modes, each associated with a particular ligand against which the antibodies are either selected or not. Using data from phage display experiments, we show that the model successfully disentangles these modes, even when they are associated with chemically very similar ligands. Additionally, we demonstrate and validate experimentally the computational design of antibodies with customized specificity profiles, either with specific high affinity for a particular target ligand, or with cross-specificity for multiple target ligands. Overall, our results showcase the potential of leveraging a biophysical model learned from selections against multiple ligands to design proteins with tailored specificity, with applications to protein engineering extending beyond the design of antibodies. A great challenge in protein science is to relate sequences to physical properties, both to predict physical properties from sequences, and to design sequences with desired phenotypes. A promising solution lies in integrating large-scale selection experiments, high-throughput sequencing, and machine learning techniques. However, existing models often focus solely on the property under selection ("fitness"), lacking interpretability. This limits our fundamental understanding of proteins and our ability to engineer them for desired properties, especially those not directly selectable in experiments. Previous studies have shown that incorporating biophysical constraints into models can offer quantitative insights, particularly in transcription factors Here, we demonstrate that when coupled with extensive experiments, such modeling can not only predict physical features but also design new proteins with specific properties. Our demonstration involves a problem of primary biotechnological and biomedical interest: the design of antibodies with defined specificity profiles. We focus on one of the most challenging tasks in the field, designing antibodies capable of discriminating between structurally and chemically similar ligands. This approach has applications for creating antibodies with both specific and cross-specific binding properties and for mitigating experimental artifacts and biases in selection experiments. The combination of biophysics-informed modeling and extensive selection experiments holds broad applicability beyond antibodies, offering a powerful toolset for designing proteins with desired physical properties.

Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets

Investigating Substitutions in Antibody–Antigen Complexes Using Molecular Dynamics: A Case Study with Broad-spectrum, Influenza A Antibodies

Learned features of antibody-antigen binding affinity

Active learning for affinity prediction of antibodies

A Comparison of Antibody-Antigen Complex Sequence-to-Structure Prediction Methods and their Systematic Biases

Inference and design of antibody specificity: From experiments to models and back

Towards the accurate modelling of antibody-antigen complexes from sequence using machine learning and information-driven docking

Benchmarking Generative Models for Antibody Design & Exploring Log-Likelihood for Sequence Ranking

AlphaBind, a Domain-Specific Model to Predict and Optimize Antibody-Antigen Binding Affinity

Exploring Log-Likelihood Scores for Ranking Antibody Sequence Designs

Explainable Machine Learning for Profiling the Immunological Synapse and Functional Characterization of Therapeutic Antibodies.

Sequence-based deep learning antibody design for in silico antibody affinity maturation

A comparison of antibody–antigen complex sequence‐to‐structure prediction methods and their systematic biases

Machine learning prediction of Antibody-Antigen binding: dataset, method and testing

Antibody Representation Learning for Drug Discovery

AttABseq: an Attention-Based Deep Learning Prediction Method for Antigen-Antibody Binding Affinity Changes Based on Protein Sequences.

When two are better than one: Modeling the mechanisms of antibody mixtures

Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness

Predictability of antigen binding based on short motifs in the antibody CDRH3

Deep Geometric Framework to Predict Antibody-Antigen Binding Affinity

Quantum Entanglement, Cognition & The Processes of Inference