Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles

Aik Rui Tan,Shingo Urata,Samuel Goldman,Johannes C.B. Dietschreit,Rafael Gómez-Bombarelli
DOI: https://doi.org/10.1038/s41524-023-01180-8
2023-05-03
Abstract:Neural networks (NNs) often assign high confidence to their predictions, even for points far out-of-distribution, making uncertainty quantification (UQ) a challenge. When they are employed to model interatomic potentials in materials systems, this problem leads to unphysical structures that disrupt simulations, or to biased statistics and dynamics that do not reflect the true physics. Differentiable UQ techniques can find new informative data and drive active learning loops for robust potentials. However, a variety of UQ techniques, including newly developed ones, exist for atomistic simulations and there are no clear guidelines for which are most effective or suitable for a given case. In this work, we examine multiple UQ schemes for improving the robustness of NN interatomic potentials (NNIPs) through active learning. In particular, we compare incumbent ensemble-based methods against strategies that use single, deterministic NNs: mean-variance estimation, deep evidential regression, and Gaussian mixture models. We explore three datasets ranging from in-domain interpolative learning to more extrapolative out-of-domain generalization challenges: rMD17, ammonia inversion, and bulk silica glass. Performance is measured across multiple metrics relating model error to uncertainty. Our experiments show that none of the methods consistently outperformed each other across the various metrics. Ensembling remained better at generalization and for NNIP robustness; MVE only proved effective for in-domain interpolation, while GMM was better out-of-domain; and evidential regression, despite its promise, was not the preferable alternative in any of the cases. More broadly, cost-effective, single deterministic models cannot yet consistently match or outperform ensembling for uncertainty quantification in NNIPs.
Machine Learning,Chemical Physics
What problem does this paper attempt to address?
The paper primarily discusses the issue of Uncertainty Quantification (UQ) in material system simulations using Neural Network Interatomic Potentials (NNIPs). Specifically, the paper compares various UQ methods, including ensemble-based approaches, Mean-Variance Estimation (MVE), Deep Evidential Regression, and Gaussian Mixture Models (GMM), to assess their performance in improving the robustness of NNIPs through Active Learning. The paper notes that neural networks often exhibit high confidence in their predictions, even for out-of-distribution data points, making UQ a challenge. When neural networks are used to simulate interatomic potentials in material systems, this issue can lead to the generation of non-physical structures, disrupt the simulation process, or cause statistical and dynamical deviations that do not reflect true physical phenomena. Differentiable UQ techniques can identify new informative data and drive active learning cycles to build more robust potential models. However, despite the availability of various UQ techniques, there is no clear guidance on which technique is most effective or best suited for specific situations in atomic-scale simulations. To delve deeper, the paper compares traditional ensemble-based methods with single deterministic neural network strategies such as MVE, Deep Evidential Regression, and GMM. Experiments cover three datasets ranging from domain-interpolation learning to more challenging domain-generalization problems: rMD17, ammonia inversion, and bulk silica glass. By measuring the relationship between model error and uncertainty through multiple metrics, the results show that no single method consistently outperforms others across all metrics. Ensemble methods still perform best in terms of generalization ability and NNIP robustness; MVE is only proven effective in domain-interpolation; GMM performs better in domain-generalization; and although Deep Evidential Regression has potential, it is not the preferred alternative in any case. More broadly, cost-effective single deterministic models currently cannot consistently match or surpass ensemble methods in UQ effectiveness within NNIPs. In summary, the paper aims to address the challenge of UQ in atomic-scale simulations with NNIPs. By comparing the effectiveness of different methods, it provides a basis for selecting the most suitable UQ strategy to enhance the reliability and applicability of NNIPs.