Abstract:Accurate prediction and optimization of protein-protein binding affinity is crucial for therapeutic antibody development. Although machine learning-based prediction methods $\Delta\Delta G$ are suitable for large-scale mutant screening, they struggle to predict the effects of multiple mutations for targets without existing binders. Energy function-based methods, though more accurate, are time consuming and not ideal for large-scale screening. To address this, we propose an active learning workflow that efficiently trains a deep learning model to learn energy functions for specific targets, combining the advantages of both approaches. Our method integrates the RDE-Network deep learning model with Rosetta's energy function-based Flex ddG to efficiently explore mutants. In a case study targeting HER2-binding Trastuzumab mutants, our approach significantly improved the screening performance over random selection and demonstrated the ability to identify mutants with better binding properties without experimental $\Delta\Delta G$ data. This workflow advances computational antibody design by combining machine learning, physics-based computations, and active learning to achieve more efficient antibody development.

What problem does this paper attempt to address?

This paper aims to solve several key problems in antibody optimization, especially the accurate prediction and optimization of protein - protein binding affinity in the development of therapeutic antibodies. Specifically, the paper attempts to solve the following problems: 1. **Challenges in large - scale mutation screening**: Although machine - learning - based prediction methods are suitable for large - scale mutation screening, they are difficult to predict the effects of multiple mutations without existing binders, which are prone to over - fitting and thus affect the sequence improvement ability in practical applications. 2. **Limitations of energy - function - based methods**: Although energy - function - based methods (such as Rosetta's Flex ddG) perform better in terms of accuracy, they are computationally expensive due to the need for structural sampling and are not suitable for large - scale screening. 3. **Combining the advantages of the two methods**: The paper proposes an active - learning workflow, which efficiently trains a deep - learning model to learn the energy function of a specific target, combining the advantages of machine - learning and energy - function - based methods to achieve more efficient antibody development. 4. **Optimization in the case of insufficient experimental data**: The paper shows how to use the calculated binding information to improve the screening performance of antibody sequences, especially the optimization of binding affinity, through multi - task learning in the absence of experimental ΔΔG data. ### Main contributions - **Proposing a new active - learning workflow**: This workflow combines the RDE - Network deep - learning model and Rosetta's Flex ddG method, and can effectively identify mutants with better binding properties in the absence of experimental ΔΔG data. - **Improving screening performance**: Through case studies (for Trastuzumab mutants binding to HER2), it is proved that this method is significantly superior to random selection in screening performance and can improve binding classification performance without experimental information. - **Balancing exploration and exploitation**: By selecting mutants with large differences in each active - learning cycle, the balance between exploration and exploitation in the screening process is ensured, thereby improving the learning efficiency of the model. ### Conclusion This study proposes an effective active - learning workflow for optimizing antibody sequences, especially in the absence of experimental data, which can significantly improve screening performance and the prediction accuracy of binding affinity. This method is not only applicable to the optimization of Trastuzumab, but may also be widely used in the design and optimization processes of other antibodies.

Active learning for energy-based antibody optimization and enhanced screening

Active learning for affinity prediction of antibodies

Co-optimization of therapeutic antibody affinity and specificity using machine learning models that generalize to novel mutational space

Active Learning Guided Drug Design Lead Optimization Based on Relative Binding Free Energy Modeling

Antibody Representation Learning for Drug Discovery

Meta learning addresses noisy and under-labeled data in machine learning-guided antibody engineering

Sequence-based deep learning antibody design for in silico antibody affinity maturation

AlphaBind, a Domain-Specific Model to Predict and Optimize Antibody-Antigen Binding Affinity

Optimal Molecular Design: Generative Active Learning Combining REINVENT with Absolute Binding Free Energy Simulations

The role of cytotoxic therapy with hematopoietic stem cell transplantation in the treatment of diffuse large cell B-cell non-Hodgkin's lymphoma.

DLAB: deep learning methods for structure-based virtual screening of antibodies

On Machine Learning Approaches for Protein-Ligand Binding Affinity Prediction

Binding Affinity Prediction with 3D Machine Learning: Training Data and Challenging External Testing

Genetic structure of Phaeosphaeria nodorum populations in the north-central and midwestern United States.

Toward enhancement of antibody thermostability and affinity by computational design in the absence of antigen

Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness

Development and evaluation of a deep learning model for protein-ligand binding affinity prediction

Active Learning-Assisted Directed Evolution

BetterBodies: Reinforcement Learning guided Diffusion for Antibody Sequence Design

FLAb: Benchmarking deep learning methods for antibody fitness prediction

AttABseq: an Attention-Based Deep Learning Prediction Method for Antigen-Antibody Binding Affinity Changes Based on Protein Sequences.