Statistical methods for resolving poor uncertainty quantification in machine learning interatomic potentials

Emil Annevelink,Venkatasubramanian Viswanathan

2023-08-30

Abstract:Machine learning interatomic potentials (MLIPs) are promising surrogates for quantum mechanics evaluations in ab-initio molecular dynamics simulations due to their ability to reproduce the energy and force landscape within chemical accuracy at four orders of magnitude less cost. While developing uncertainty quantification (UQ) tools for MLIPs is critical to build production MLIP datasets using active learning, only limited progress has been made and the most robust method, ensembling, still shows low correlation between high error and high uncertainty predictions. Here we develop a rigorous method rooted in statistics for determining an error cutoff that distinguishes regions of high and low UQ performance. The statistical cutoff illuminates that a main cause of the poor UQ performance is due to the machine learning model already describing the entire dataset and not having any datapoints with error greater than the statistical error distribution. Second, we extend the statistical analysis to create an interpretable connection between the error and uncertainty distributions to predict an uncertainty cutoff separating high and low errors. We showcase the statistical cutoff in active learning benchmarks on two datasets of varying chemical complexity for three common UQ methods: ensembling, sparse Gaussian processes, and latent distance metrics and compare them to the true error and random sampling, showing that the statistical cutoff is generalizable to a variety of different UQ methods and protocols and performs similarly to using the true error. Importantly, we conclude that utilizing this uncertainty cutoff enables using significantly lower cost uncertainty quantification tools such as sparse gaussian processes and latent distances compared to ensembling approaches for generating MLIP datasets at a fraction of the cost.

Materials Science

What problem does this paper attempt to address?

The paper attempts to address the issue of poor uncertainty quantification in machine learning interatomic potentials (MLIPs). Specifically: 1. **Development of Uncertainty Quantification Tools**: Although the development of uncertainty quantification tools is crucial for constructing production-level MLIP datasets through active learning, progress in this area has been limited. Even the most robust method—the ensemble method—still shows a low correlation between high error and high uncertainty predictions. 2. **Determining Thresholds for High Error and High Uncertainty**: An active learning workflow requires a threshold to distinguish between high error and high uncertainty cases. However, determining these thresholds is challenging because different uncertainty models define high uncertainty differently. 3. **Application of Statistical Methods**: The paper proposes a statistical approach to determine the threshold between high error and low error and extends the statistical analysis to create an interpretable connection between error and uncertainty distributions, thereby predicting the uncertainty threshold that separates high error from low error. By conducting experiments on two datasets with different chemical complexities and comparing three common uncertainty quantification methods (ensemble, sparse Gaussian process, latent distance metric), the results show that this approach can be generally applied to various uncertainty quantification methods and performs similarly to using actual errors. Moreover, this method allows even poorly calibrated models to accurately classify high error data points, significantly reducing the cost of generating MLIP datasets.

Statistical methods for resolving poor uncertainty quantification in machine learning interatomic potentials

Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles

Robust and scalable uncertainty estimation with conformal prediction for machine-learned interatomic potentials

Uncertainty Quantification and Propagation in Atomistic Machine Learning

Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials

Uncertainty quantification in scientific machine learning: Methods, metrics, and comparisons

Discrepancies and error evaluation metrics for machine learning interatomic potentials

On the Uncertainty Estimates of Equivariant-Neural-Network-Ensembles Interatomic Potentials

Uncertainty Quantification Using Neural Networks for Molecular Property Prediction

Global ranking of the sensitivity of interaction potential contributions within classical molecular dynamics force fields

Uncertainty Quantification Driven Machine Learning for Improving Model Accuracy in Imbalanced Regression Tasks

Methods for comparing uncertainty quantifications for material property predictions

Improved uncertainty quantification for Gaussian process regression based interatomic potentials

Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and Pitfalls

Model-free quantification of completeness, uncertainties, and outliers in atomistic machine learning using information theory

Enhanced sampling of robust molecular datasets with uncertainty-based collective variables

Uncertainty-biased molecular dynamics for learning uniformly accurate interatomic potentials

Quality measures for the evaluation of machine learning architectures on the quantification of epistemic and aleatoric uncertainties in complex dynamical systems

Deep Neural Network Uncertainty Quantification for LArTPC Reconstruction

Uncertainty quantification by direct propagation of shallow ensembles