${\it Asparagus}$: A Toolkit for Autonomous, User-Guided Construction of Machine-Learned Potential Energy Surfaces

Kai Töpfer,Luis Itza Vazquez-Salazar,Markus Meuwly
2024-07-21
Abstract:With the establishment of machine learning (ML) techniques in the scientific community, the construction of ML potential energy surfaces (ML-PES) has become a standard process in physics and chemistry. So far, improvements in the construction of ML-PES models have been conducted independently, creating an initial hurdle for new users to overcome and complicating the reproducibility of results. Aiming to reduce the bar for the extensive use of ML-PES, we introduce ${\it Asparagus}$, a software package encompassing the different parts into one coherent implementation that allows an autonomous, user-guided construction of ML-PES models. ${\it Asparagus}$ combines capabilities of initial data sampling with interfaces to ${\it ab initio}$ calculation programs, ML model training, as well as model evaluation and its application within other codes such as ASE or CHARMM. The functionalities of the code are illustrated in different examples, including the dynamics of small molecules, the representation of reactive potentials in organometallic compounds, and atom diffusion on periodic surface structures. The modular framework of ${\it Asparagus}$ is designed to allow simple implementations of further ML-related methods and models to provide constant user-friendly access to state-of-the-art ML techniques.
Chemical Physics,Computational Engineering, Finance, and Science,Machine Learning,Computational Physics
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address several challenges in the process of constructing machine learning potential energy surfaces (ML-PES). Despite significant progress in the development and improvement of machine learning model architectures, there is still a lack of consistent and reproducible workflow tools, making it difficult for new users to get started and increasing the difficulty of reproducing results. To lower the barrier to using ML-PES, the authors introduce a software package called Asparagus, which integrates various stages such as data generation, sampling techniques, data management, model training, testing, and evaluation tools into a modular comprehensive workflow, thereby supporting users in independently building ML-PES models. By simplifying these steps, Asparagus not only reduces the entry difficulty for new users but also improves the efficiency of the physical-chemical evaluation of models and molecular dynamics simulations. ### Solution Asparagus is a Python-written package that provides a streamlined and extensible workflow with a user-friendly command structure to support the construction of ML-PES. Specifically, Asparagus implements the following functions: 1. **Sampling Reference Structures**: Generates reference structures needed for the initial ML-PES through various sampling methods (e.g., molecular dynamics, Monte Carlo, normal mode sampling, metadynamics, etc.). Asparagus also supports importing samples from existing reference data sources and calculating reference properties of sample structures (energy, atomic forces, atomic charges, molecular dipole moments, etc.). 2. **Model Training**: Divides the reference data in the database into training, validation, and test sets, defines the loss function, and initializes the PyTorch optimizer. Asparagus currently supports PhysNet and PaiNN neural network architectures, but its modular design allows for the easy addition of other established ML architectures. 3. **Testing**: Provides functions to evaluate the accuracy of property predictions on the reference dataset or its subsets, returning statistical metrics (such as mean absolute error MAE and root mean square error RMSE) and generating correlation plots and prediction error histograms. 4. **Feature Characterization**: Includes tools such as searching for minimum energy paths (MEP), minimum dynamic paths (MDP), and diffusion Monte Carlo (DMC) sampling to further verify the accuracy and stability of ML-PES or identify regions of the configuration space that require additional samples. 5. **Interfaces**: Asparagus provides interfaces with molecular simulation programs such as ASE and CHARMM, enabling the use of ML-PES for molecular dynamics simulations or other applications. ### Features - **Modular Design**: The modular structure of Asparagus allows for the simple addition of new ML-related methods and models without modifying other modules or parameter pipelines. - **User-Friendly**: Default input parameters allow for quick setup of ML-PES construction while supporting fine-tuning for specific needs. - **Scalability**: Supports interfaces with other simulation packages, making the application of ML-PES more extensive. Through these features, Asparagus aims to provide the scientific research community with an efficient, consistent, and easy-to-use tool to promote the widespread application of ML-PES.