Tell machine learning potentials what they are needed for: Simulation-oriented training exemplified for glycine

Fuchun Ge,Ran Wang,Chen Qu,Peikun Zheng,Apurba Nandi,Riccardo Conte,Paul L. Houston,Joel M. Bowman,Pavlo O. Dral
DOI: https://doi.org/10.1021/acs.jpclett.4c00746
2024-04-08
Abstract:Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PES) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by the diffusion Monte Carlo simulations.
Chemical Physics
What problem does this paper attempt to address?
The paper discusses the application issues of Machine Learning Potentials (MLPs) in chemical simulations. Typically, the performance of MLPs is evaluated using the Root Mean Square Error (RMSE) on a test set. However, it has been found that this error may not accurately reflect the performance of MLPs in different simulation tasks, such as relative conformational energy, potential barriers, vibrational levels, and zero point vibrational energy. The paper takes the full-dimensional global potential energy surface of glycine as an example to systematically study the relationship between the test error and simulation accuracy. The study found that the test set error cannot clearly reflect the performance of MLPs in simulation tasks. Therefore, a simulation-guided training solution is proposed to improve the quality of MLPs, especially in computing the most accurate relative conformational energy and potential barriers. This method is rigorously validated through diffusion Monte Carlo simulations. The paper emphasizes that relying solely on the RMSE of the test set is insufficient to evaluate the performance of MLPs, as it may overlook subtle differences on specific potential energy surfaces, and the error metric may be misleading when the data distribution is different. To address this issue, a strategy is proposed in the study to adjust the weights during the training process, aiming to improve the accuracy of MLPs in the low-energy region, which is crucial for key regions in practical simulations. Through this approach, the research team successfully improved the performance of MLPs of the neural network (NN) type, reaching the level of MLPs based on permutation invariant polynomials (PIPs). This method is expected to be applied to similar cases to improve the performance of other NN potentials and applications. In conclusion, the paper reveals the limitations of relying solely on the test set RMSE to evaluate MLPs and proposes a simulation-guided training strategy to ensure the reliability of MLPs in chemical simulations, especially in describing the rich conformational space of molecules.