Developing Differentiable Long-Range Force Field for Proteins with E(3) Neural Network Predicted Asymptotic Parameters

Zheng Cheng,Hangrui Bi,Siyuan Liu,Junmin Chen,Kuang Yu
DOI: https://doi.org/10.26434/chemrxiv-2024-kkb5h
2024-03-11
Abstract:Accurately describing long-range interactions is a significant challenge in molecular dynamics (MD) simulations of proteins. And high-quality long-range potential is also an important component of range-separated machine learning force field. This study introduces a comprehensive asymptotic parameter database, encompassing atomic multipole moments, polarizabilities, and dispersion coefficients. Leveraging active learning, our database comprehensively represents protein fragments with up to 8 heavy atoms, capturing their conformational diversity with merely 78,000 data points. Additionally, E(3) neural network (E3NN) is employed to predict the asymptotic parameters directly from the local geometry. The E3NN models demonstrate exceptional accuracy and transferability across all asymptotic parameters, achieving an R2 of 0.999 for both protein fragments and 20 amino acid dipeptide test sets. The long range electrostatic and dispersion energies can be obtained using the the E3NN-predited parameters, with an error of 0.07 and 0.02 kcal/mol, respectively, when compared to Symmetry-Adapted Perturbation Theory (SAPT). Therefore, our force fields demonstrate the capability to accurately describe long-range interactions in proteins, paving the way for the nextgeneration protein force fields.
Chemistry
What problem does this paper attempt to address?
This paper mainly discusses the problem of accurately describing long-range interactions in protein molecular dynamics (MD) simulations. The current challenge is how to construct high-quality long-range potentials, which are important components of range-separated machine learning force fields. The research team has built a comprehensive asymptotic parameter database, including atomic multipole moments, polarization rates, and dispersion coefficients. They have used an active learning method to represent protein fragments containing up to 8 heavy atoms and captured their conformational diversity using only 78,000 data points. They use the E(3) neural network (E3NN) to directly predict these parameters from local geometric structures, demonstrating high precision and transferability on all asymptotic parameters. The paper also introduces the E3NN model, which can predict charge electrostatics and dispersion energy. Compared with Symmetry-Adapted Perturbation Theory (SAPT), the errors are 0.07 and 0.02 kcal/mol, respectively. This indicates that their force field can accurately describe long-range interactions in proteins and lays the foundation for the development of the next generation of protein force fields. The researchers constructed a database containing important asymptotic data by systematically sampling chemical and geometric spaces of protein fragments and used the E3NN model to learn the mapping between local geometry and asymptotic parameters. They have developed a PyTorch-based implementation of DMFF to achieve differentiable calculations of the force field, enabling deployment in actual MD simulations. In summary, this paper aims to address the problem of describing long-range interactions in protein MD simulations. By building physics-informed low-dimensional calculations and machine learning models, it improves the prediction accuracy of protein long-range interactions and provides new pathways for the development of future force fields.