Developing Differentiable Long-Range Force Field for Proteins with E(3) Neural Network Predicted Asymptotic Parameters

Zheng Cheng,Hangrui Bi,Siyuan Liu,Junmin Chen,Kuang Yu,Alston J. Misquitta
DOI: https://doi.org/10.26434/chemrxiv-2024-kkb5h-v2
2024-05-29
Abstract:Accurately describing long-range interactions is a significant challenge in molecular dynamics (MD) simulations of proteins. And high-quality long-range potential is also an important component of range-separated machine learning force field. This study introduces a comprehensive asymptotic parameter database, encompassing atomic multipole moments, polarizabilities, and dispersion coefficients. Leveraging active learning, our database comprehensively represents protein fragments with up to 8 heavy atoms, capturing their conformational diversity with merely 78,000 data points. Additionally, E(3) neural network (E3NN) is employed to predict the asymptotic parameters directly from the local geometry. The E3NN models demonstrate exceptional accuracy and transferability across all asymptotic parameters, achieving an R2 of 0.999 for both protein fragments and 20 amino acid dipeptide test sets. The long range electrostatic and dispersion energies can be obtained using the the E3NN-predited parameters, with an error of 0.07 and 0.02 kcal/mol, respectively, when compared to Symmetry-Adapted Perturbation Theory (SAPT). Therefore, our force fields demonstrate the capability to accurately describe long-range interactions in proteins, paving the way for the nextgeneration protein force fields.
Chemistry
What problem does this paper attempt to address?
This paper aims to address the challenge of describing long-range interactions in protein molecular dynamics (MD) simulations. Existing force fields typically use empirical parameters and simple functional forms, but they fail to accurately capture the microscopic details of the potential energy surface, limiting their ability to predict macroscopic properties. The study introduces a comprehensive asymptotic parameter database, including atomic multipole moments, polarizability, and dispersion coefficients, and uses an active learning approach to predict these parameters directly from local geometric structures using the E(3) neural network (E3NN). The E3NN model demonstrates high accuracy and transferability on various parameters, with errors of 0.07 and 0.02 kcal/mol for charge electrostatics and dispersion energy, respectively, compared to Symmetry-Adapted Perturbation Theory (SAPT). This indicates that their force field can accurately describe the long-range interactions of proteins, laying the foundation for the development of next-generation protein force fields.