Spatially resolved uncertainties for machine learning potentials

Esther Heid,Johannes Schörghuber,Ralf Wanzenböck,Georg. K. H. Madsen
DOI: https://doi.org/10.26434/chemrxiv-2024-k27ps
2024-05-02
Abstract:Machine learning potentials have become an essential tool for atomistic simulations, yielding results close to ab-initio simulations at a fraction of computational cost. With recent improvements on the achievable accuracies, the focus has now shifted on the dataset composition itself. The reliable identification of erroneously predicted configurations to extend a given dataset is therefore of high priority. Yet, uncertainty estimation techniques have largely failed for machine learning potentials. Consequently, a general and versatile method to correlate energy or atomic force uncertainties with the model error has remained elusive to date. In the current work, we show that epistemic uncertainty cannot correlate with model error by definition, but can be aggregated over groups of atoms to yield a strong correlation. We demonstrate that our method correctly estimates prediction errors both globally per structure, and locally resolved per atom. The direct correlation of local uncertainty and local error is used to design an active learning framework based on identifying local sub-regions of a large simulation cell, and performing ab-initio calculations only for the sub-region subsequently. We successfully utilize this method to perform active learning in the low-data regime for liquid water.
Chemistry
What problem does this paper attempt to address?
This paper addresses the issue of accuracy and uncertainty estimation in machine learning potentials. While existing methods attempt to estimate prediction errors, they perform poorly in machine learning potentials. The study found that the epistemic uncertainty (reducible uncertainty caused by insufficient data) of individual data points is not directly correlated with model error, but a strong correlation can be established through the aggregation of atomic swarms. The paper proposes a new approach that can accurately estimate prediction errors globally and locally, and uses it to design an active learning framework that identifies high-error local subregions to improve the efficiency of large-scale simulations.