sGDML: Constructing Accurate and Data Efficient Molecular Force Fields Using Machine Learning

Stefan Chmiela,Huziel E. Sauceda,Igor Poltavsky,Klaus-Robert Müller,Alexandre Tkatchenko
DOI: https://doi.org/10.1016/j.cpc.2019.02.007
2019-03-02
Abstract:We present an optimized implementation of the recently proposed symmetric gradient domain machine learning (sGDML) model. The sGDML model is able to faithfully reproduce global potential energy surfaces (PES) for molecules with a few dozen atoms from a limited number of user-provided reference molecular conformations and the associated atomic forces. Here, we introduce a Python software package to reconstruct and evaluate custom sGDML force fields (FFs), without requiring in-depth knowledge about the details of the model. A user-friendly command-line interface offers assistance through the complete process of model creation, in an effort to make this novel machine learning approach accessible to broad practitioners. Our paper serves as a documentation, but also includes a practical application example of how to reconstruct and use a PBE0+MBD FF for paracetamol. Finally, we show how to interface sGDML with the FF simulation engines ASE (Larsen et al., J. Phys. Condens. Matter 29, 273002 (2017)) and i-PI (Kapil et al., Comput. Phys. Commun. 236, 214-223 (2019)) to run numerical experiments, including structure optimization, classical and path integral molecular dynamics and nudged elastic band calculations.
Computational Physics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of constructing accurate and data - efficient molecular force fields (FFs), especially for molecules containing dozens of atoms. Traditional methods usually require a large amount of computing resources and time when calculating potential energy surfaces (PES). However, this research proposes a machine - learning - based method - Symmetric Gradient Domain Machine Learning (sGDML) - to improve efficiency and accuracy. Specifically, the paper solves the following key problems: 1. **Improving computational efficiency**: Although traditional ab initio methods are accurate, their computational cost is extremely high, making it difficult to apply them to large - scale or long - time simulations. By introducing machine - learning techniques, sGDML significantly reduces the computational cost while maintaining high precision, enabling complex molecular dynamics simulations to be quickly completed on an ordinary laptop. 2. **Data efficiency**: The sGDML model can reconstruct the global potential energy surface from a limited number of reference molecular conformations and their corresponding atomic forces, thereby reducing the amount of required data. This is especially important when experimental data is scarce or difficult to obtain. 3. **Integration of physical symmetries**: This model not only considers physical symmetries in space and time but also automatically discovers the dynamic non - rigid symmetries (such as methyl rotation) of molecular systems. This helps to reduce the intrinsic complexity of the model and improves its generalization ability. 4. **User - friendliness**: In order to make this advanced machine - learning method easy to be used by a large number of scientific researchers, the authors developed a Python package, which provides a user - friendly command - line interface to help users complete the whole process from data preparation to model training, verification, and testing. 5. **Practical application example**: The paper shows, through a specific example, how to use sGDML to reconstruct the force field of paracetamol and interface it with popular molecular simulation engines (such as ASE and i - PI) for molecular dynamics simulations and other related tasks. In summary, the main goal of this paper is to achieve accurate and efficient construction of molecular force fields through the sGDML model, thereby providing a new tool for accurate prediction of molecular behavior, especially when conducting large - scale molecular dynamics simulations under real - world conditions.