Improving Antibody Design with Force-Guided Sampling in Diffusion Models

Paulina Kulytė,Francisco Vargas,Simon Valentin Mathis,Yu Guang Wang,José Miguel Hernández-Lobato,Pietro Liò
2024-06-09
Abstract:Antibodies, crucial for immune defense, primarily rely on complementarity-determining regions (CDRs) to bind and neutralize antigens, such as viruses. The design of these CDRs determines the antibody's affinity and specificity towards its target. Generative models, particularly denoising diffusion probabilistic models (DDPMs), have shown potential to advance the structure-based design of CDR regions. However, only a limited dataset of bound antibody-antigen structures is available, and generalization to out-of-distribution interfaces remains a challenge. Physics based force-fields, which approximate atomic interactions, offer a coarse but universal source of information to better mold designs to target interfaces. Integrating this foundational information into diffusion models is, therefore, highly desirable. Here, we propose a novel approach to enhance the sampling process of diffusion models by integrating force field energy-based feedback. Our model, DiffForce, employs forces to guide the diffusion sampling process, effectively blending the two distributions. Through extensive experiments, we demonstrate that our method guides the model to sample CDRs with lower energy, enhancing both the structure and sequence of the generated antibodies.
Quantitative Methods,Machine Learning,Biomolecules
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate complementarity - determining regions (CDRs) with lower energy, better structure and sequence through force - field - guided diffusion model sampling in antibody design. Specifically, the paper proposes solutions to the following challenges: 1. **Dataset limitations**: The available datasets of antibody - antigen structures in the bound state are limited, resulting in poor performance of the model when dealing with out - of - distribution interfaces. 2. **Limitations of traditional methods**: Traditional antibody design methods, such as animal immunization and computational methods based on complex biophysical energy functions, have ethical issues, high computational costs, and are prone to getting trapped in local optimal solutions. 3. **Limitations of diffusion models**: Although diffusion models perform well in generating high - quality protein structures, they still face challenges when dealing with unseen data, especially when generating antibody CDRs with high affinity and specificity. To solve these problems, the paper proposes a new method named **DIFFFORCE**, which enhances the sampling process of the diffusion model by integrating force - field energy feedback in physics. Specifically, DIFFFORCE uses the force field to guide the diffusion sampling process, effectively combining two distributions (data distribution and energy distribution). Through this method, the model can generate antibody CDRs with lower energy, better structure and sequence, thereby improving the quality of antibody design. ### Main contributions - **Propose a force - field - guided diffusion model for the first time**: Use a differentiable force field to guide the sampling process, effectively using the weighted geometric mean of the two distributions. Unlike existing methods, this model does not require training a separate energy approximation network or conditional diffusion model. - **Propose a method for approximating denoising samples**: Through interpolation interpretation, provide an accurate energy calculation method to ensure the accurate application of the force field in the diffusion sampling process. In addition, a method for approximating amino acid types and directions is also proposed. ### Experimental results The paper verifies the effectiveness of the DIFFFORCE model through experiments: - **Binding energy improvement**: The antibody CDRs generated by DIFFFORCE show significant improvement in binding energy, especially in the H1 and H3 regions. - **Structural diversity**: The generated CDRs are more structurally diverse, as indicated by higher RMSD values. - **Sequence accuracy**: The generated CDRs have a higher coincidence rate with the reference sequence in terms of sequence, especially in the H1 and H2 regions. These results indicate that the DIFFFORCE model has significant advantages in generating high - quality antibody CDRs, especially when dealing with the complex H3 region.