Active learning for energy-based antibody optimization and enhanced screening

Kairi Furui,Masahito Ohue
2024-09-18
Abstract:Accurate prediction and optimization of protein-protein binding affinity is crucial for therapeutic antibody development. Although machine learning-based prediction methods $\Delta\Delta G$ are suitable for large-scale mutant screening, they struggle to predict the effects of multiple mutations for targets without existing binders. Energy function-based methods, though more accurate, are time consuming and not ideal for large-scale screening. To address this, we propose an active learning workflow that efficiently trains a deep learning model to learn energy functions for specific targets, combining the advantages of both approaches. Our method integrates the RDE-Network deep learning model with Rosetta's energy function-based Flex ddG to efficiently explore mutants. In a case study targeting HER2-binding Trastuzumab mutants, our approach significantly improved the screening performance over random selection and demonstrated the ability to identify mutants with better binding properties without experimental $\Delta\Delta G$ data. This workflow advances computational antibody design by combining machine learning, physics-based computations, and active learning to achieve more efficient antibody development.
Biomolecules,Artificial Intelligence,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
This paper aims to solve several key problems in antibody optimization, especially the accurate prediction and optimization of protein - protein binding affinity in the development of therapeutic antibodies. Specifically, the paper attempts to solve the following problems: 1. **Challenges in large - scale mutation screening**: Although machine - learning - based prediction methods are suitable for large - scale mutation screening, they are difficult to predict the effects of multiple mutations without existing binders, which are prone to over - fitting and thus affect the sequence improvement ability in practical applications. 2. **Limitations of energy - function - based methods**: Although energy - function - based methods (such as Rosetta's Flex ddG) perform better in terms of accuracy, they are computationally expensive due to the need for structural sampling and are not suitable for large - scale screening. 3. **Combining the advantages of the two methods**: The paper proposes an active - learning workflow, which efficiently trains a deep - learning model to learn the energy function of a specific target, combining the advantages of machine - learning and energy - function - based methods to achieve more efficient antibody development. 4. **Optimization in the case of insufficient experimental data**: The paper shows how to use the calculated binding information to improve the screening performance of antibody sequences, especially the optimization of binding affinity, through multi - task learning in the absence of experimental ΔΔG data. ### Main contributions - **Proposing a new active - learning workflow**: This workflow combines the RDE - Network deep - learning model and Rosetta's Flex ddG method, and can effectively identify mutants with better binding properties in the absence of experimental ΔΔG data. - **Improving screening performance**: Through case studies (for Trastuzumab mutants binding to HER2), it is proved that this method is significantly superior to random selection in screening performance and can improve binding classification performance without experimental information. - **Balancing exploration and exploitation**: By selecting mutants with large differences in each active - learning cycle, the balance between exploration and exploitation in the screening process is ensured, thereby improving the learning efficiency of the model. ### Conclusion This study proposes an effective active - learning workflow for optimizing antibody sequences, especially in the absence of experimental data, which can significantly improve screening performance and the prediction accuracy of binding affinity. This method is not only applicable to the optimization of Trastuzumab, but may also be widely used in the design and optimization processes of other antibodies.