Efficient Generation of Stable Linear Machine-Learning Force Fields with Uncertainty-Aware Active Learning

Valerio Briganti,Alessandro Lunghi
DOI: https://doi.org/10.1088/2632-2153/ace418
2023-03-29
Abstract:Machine-learning force fields enable an accurate and universal description of the potential energy surface of molecules and materials on the basis of a training set of ab initio data. However, large-scale applications of these methods rest on the possibility to train accurate machine learning models with a small number of ab initio data. In this respect, active-learning strategies, where the training set is self-generated by the model itself, combined with linear machine-learning models are particularly promising. In this work, we explore an active-learning strategy based on linear regression and able to predict the model's uncertainty on predictions for molecular configurations not sampled by the training set, thus providing a straightforward recipe for the extension of the latter. We apply this strategy to the spectral neighbor analysis potential and show that only tens of ab initio simulations of atomic forces are required to generate stable force fields for room-temperature molecular dynamics at or close to chemical accuracy. Moreover, the method does not necessitate any conformational pre-sampling, thus requiring minimal user intervention and parametrization.
Computational Physics,Materials Science,Chemical Physics
What problem does this paper attempt to address?
The paper attempts to address the problem of efficiently generating stable and uncertainty-aware linear machine learning force fields in molecular dynamics simulations. Specifically, the authors propose an active learning strategy based on linear regression, which can predict the model's uncertainty for molecular configurations not sampled in the training set, thereby providing a direct method for expanding the training set. Through this method, stable force fields suitable for room temperature molecular dynamics simulations can be generated with only a small amount of ab initio data, without any pre-sampling, thus reducing user intervention and parameterization work. The authors apply this strategy to the Spectral Neighbor Analysis Potential (SNAP), demonstrating that stable force fields close to chemical accuracy can be generated with only dozens of ab initio atomic force simulations. Moreover, this method can effectively handle complex chemical systems and is applicable to research in various fields such as drug discovery, metastable structure prediction, and heterogeneous catalysis, while significantly reducing the computational cost associated with electronic structure simulations. By comparing with the method of randomly selecting the training set, it is shown that the proposed active learning method can minimize computational overhead while ensuring model accuracy and maintaining the stability of molecular dynamics trajectories.