Molecular Simulations with a Pretrained Neural Network and Universal Pairwise Force Fields

Adil Kabylda,J. Thorben Frank,Sergio Suarez Dou,Almaz Khabibrakhmanov,Leonardo Medrano Sandonas,Oliver T. Unke,Stefan Chmiela,Klaus-Robert Muller,Alexandre Tkatchenko
DOI: https://doi.org/10.26434/chemrxiv-2024-bdfr0
2024-10-08
Abstract:Machine Learning Force Fields (MLFFs) promise to enable general molecular simulations that can simultaneously achieve efficiency, accuracy, transferability, and scalability for diverse molecules, materials, and hybrid interfaces. A key step toward this goal has been made with the GEMS approach to biomolecular dynamics [Sci. Adv. 10, eadn4397 (2024)]. This work introduces the SO3LR method that integrates the fast and stable SO3krates neural network for semi-local interactions with universal pairwise force fields designed for short-range repulsion, long-range electrostatics, and dispersion interactions. SO3LR is trained on a diverse set of 4 million neutral and charged molecular complexes computed at the PBE0+MBD level of quantum mechanics, ensuring a comprehensive coverage of covalent and non-covalent interactions. Our approach is characterized by computational and data efficiency, scalability to 200 thousand atoms on a single GPU, and reasonable to high accuracy across the chemical space of organic (bio)molecules. SO3LR is applied to study units of four major biomolecule types, polypeptide folding, and nanosecond dynamics of larger systems such as a protein, a glycoprotein, and a lipid bilayer, all in explicit solvent. Finally, we discuss the future challenges toward truly general molecular simulations by combining MLFFs with traditional atomistic models.
Chemistry
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to simultaneously achieve efficiency (Efficiency), accuracy (Accuracy), transferability (Transferability) and scalability (Scalability) in molecular simulations, namely the EAST requirements. Existing methods usually make significant compromises in these aspects, resulting in a limited scope of application. This paper addresses this challenge by proposing a new machine - learning force field (MLFF) - the SO3LR method. SO3LR combines the fast and stable SO3krates neural network for semi - local interactions and a general pairwise force field designed for short - range repulsion, long - range electrostatics and dispersion interactions. In this way, SO3LR aims to provide a general molecular simulation tool that can be widely applied to different molecules, materials and hybrid interfaces while maintaining computational efficiency, data efficiency and high precision. Specifically, the paper addresses the following key issues: 1. **Constructing an atomic force field model that meets the EAST requirements**: Traditional force field models are either based on approximate but fast mechanical expressions or on accurate but extremely computationally costly ab - initio electronic structure calculations. Both methods sacrifice accuracy and efficiency. SO3LR aims to overcome these limitations by combining a machine - learning model and physically - inspired pairwise potential energy terms. 2. **Describing semi - local and long - range interactions**: SO3LR describes semi - local multi - body interactions through the SO3krates model and handles short - range repulsion, long - range electrostatics and dispersion interactions through physical pairwise terms. This combination ensures the wide applicability of the model in different chemical spaces. 3. **Comprehensive coverage of the dataset**: SO3LR is trained on a dataset containing 4 million neutral and charged molecular complexes at the PBE0 + MBD quantum - mechanical calculation level, covering a comprehensive range of covalent and non - covalent interactions. 4. **Scalability for large - scale simulations**: SO3LR can be scaled to systems of 200,000 atoms on a single GPU with reasonable high precision, applicable to the chemical space of organic (biological) molecules. Through these improvements, SO3LR has demonstrated its performance and potential in the simulations of small - molecule units, the folding of polyalanine systems, the dynamics of liquid water and large - molecule systems, especially in the explicit - solvent simulations of biomolecules.