Tell machine learning potentials what they are needed for: Simulation-oriented training exemplified for glycine

Fuchun Ge,Ran Wang,Chen Qu,Peikun Zheng,Apurba Nandi,Riccardo Conte,Paul L. Houston,Joel M. Bowman,Pavlo O. Dral

DOI: https://doi.org/10.1021/acs.jpclett.4c00746

2024-04-08

Abstract:Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PES) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by the diffusion Monte Carlo simulations.

Chemical Physics

What problem does this paper attempt to address?

The paper discusses the application issues of Machine Learning Potentials (MLPs) in chemical simulations. Typically, the performance of MLPs is evaluated using the Root Mean Square Error (RMSE) on a test set. However, it has been found that this error may not accurately reflect the performance of MLPs in different simulation tasks, such as relative conformational energy, potential barriers, vibrational levels, and zero point vibrational energy. The paper takes the full-dimensional global potential energy surface of glycine as an example to systematically study the relationship between the test error and simulation accuracy. The study found that the test set error cannot clearly reflect the performance of MLPs in simulation tasks. Therefore, a simulation-guided training solution is proposed to improve the quality of MLPs, especially in computing the most accurate relative conformational energy and potential barriers. This method is rigorously validated through diffusion Monte Carlo simulations. The paper emphasizes that relying solely on the RMSE of the test set is insufficient to evaluate the performance of MLPs, as it may overlook subtle differences on specific potential energy surfaces, and the error metric may be misleading when the data distribution is different. To address this issue, a strategy is proposed in the study to adjust the weights during the training process, aiming to improve the accuracy of MLPs in the low-energy region, which is crucial for key regions in practical simulations. Through this approach, the research team successfully improved the performance of MLPs of the neural network (NN) type, reaching the level of MLPs based on permutation invariant polynomials (PIPs). This method is expected to be applied to similar cases to improve the performance of other NN potentials and applications. In conclusion, the paper reveals the limitations of relying solely on the test set RMSE to evaluate MLPs and proposes a simulation-guided training strategy to ensure the reliability of MLPs in chemical simulations, especially in describing the rich conformational space of molecules.

Tell machine learning potentials what they are needed for: Simulation-oriented training exemplified for glycine

Considerations in the use of ML interaction potentials for free energy calculations

Stable and Accurate Atomistic Simulations of Flexible Molecules using Conformationally Generalisable Machine Learned Potentials

Challenges for Machine Learning Force Fields in Reproducing Potential Energy Surfaces of Flexible Molecules

Machine-Learning-Assisted Free Energy Simulation of Solution-Phase and Enzyme Reactions

Discrepancies and Error Evaluation Metrics for Machine Learning Interatomic Potentials

Transferable Performance of Machine Learning Potentials Across Graphene-Water Systems of Different Sizes: Insights from Numerical Metrics and Physical Characteristics.

A dual-cutoff machine-learned potential for condensed organic systems obtained via uncertainty-guided active learning

Refining Potential Energy Surface through Dynamical Properties via Differentiable Molecular Simulation

Machine learning potentials with Iterative Boltzmann Inversion: training to experiment

Global Neural Network Potential with Explicit Many-Body Functions for Improved Descriptions of Complex Potential Energy Surface.

Developing General Reactive Element-Based Machine Learning Potentials as the Main Computational Engine for Heterogeneous Catalysis

Transferability evaluation of the deep potential model for simulating water-graphene confined system

Tutorial: How to Train a Neural Network Potential

Training Machine Learning Potentials for Reactive Systems: A Colab Tutorial on Basic Models.

Extending the atomic decomposition and many-body representation, a chemistry-motivated monomer-centered approach for machine learning potentials

Large-Scale Atomic Simulation via Machine Learning Potentials Constructed by Global Potential Energy Surface Exploration

Beyond potential energy surface benchmarking: a complete application of machine learning to chemical reactivity

A Hessian-Based Assessment of Atomic Forces for Training Machine Learning Interatomic Potentials

Hydrogen under Pressure as a Benchmark for Machine-Learning Interatomic Potentials

Numerical Accuracy Matters: Applications of Machine Learned Potential Energy Surfaces