Robust and scalable uncertainty estimation with conformal prediction for machine-learned interatomic potentials

Yuge Hu,Joseph Musielewicz,Zachary Ulissi,Andrew J. Medford
DOI: https://doi.org/10.48550/arXiv.2208.08337
2022-11-22
Abstract:Uncertainty quantification (UQ) is important to machine learning (ML) force fields to assess the level of confidence during prediction, as ML models are not inherently physical and can therefore yield catastrophically incorrect predictions. Established a-posteriori UQ methods, including ensemble methods, the dropout method, the delta method, and various heuristic distance metrics, have limitations such as being computationally challenging for large models due to model re-training. In addition, the uncertainty estimates are often not rigorously calibrated. In this work, we propose combining the distribution-free UQ method, known as conformal prediction (CP), with the distances in the neural network's latent space to estimate the uncertainty of energies predicted by neural network force fields. We evaluate this method (CP+latent) along with other UQ methods on two essential aspects, calibration, and sharpness, and find this method to be both calibrated and sharp under the assumption of independent and identically-distributed (i.i.d.) data. We show that the method is relatively insensitive to hyperparameters selected, and test the limitations of the method when the i.i.d. assumption is violated. Finally, we demonstrate that this method can be readily applied to trained neural network force fields with traditional and graph neural network architectures to obtain estimates of uncertainty with low computational costs on a training dataset of 1 million images to showcase its scalability and portability. Incorporating the CP method with latent distances offers a calibrated, sharp and efficient strategy to estimate the uncertainty of neural network force fields. In addition, the CP approach can also function as a promising strategy for calibrating uncertainty estimated by other approaches.
Chemical Physics,Computational Physics
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of uncertainty quantification (UQ) in machine - learning force fields (MLFFs). Specifically, the author focuses on how to provide reliable uncertainty estimates for neural - network - based inter - atomic potential predictions. The following are the key problems that the paper attempts to solve: 1. **Limitations of existing UQ methods**: - **High computational complexity**: Existing posterior UQ methods (such as ensemble methods, Dropout methods, Delta methods, etc.) are computationally expensive when dealing with large models because they usually require retraining the model. - **Uncalibrated uncertainty estimates**: The uncertainty estimates provided by these methods are often not strictly calibrated, leading to inaccurate confidence assessments of prediction results. 2. **Introduction of new UQ methods**: - **Combining conformal prediction (CP) and latent - space distance**: The author proposes a new UQ method that combines distribution - free conformal prediction with the distance in the neural - network latent space. This method does not need to assume that the error follows a normal distribution, thus improving the calibration accuracy of uncertainty. - **Improving scalability and portability**: The new method can perform uncertainty estimation on trained neural - network force fields at a lower computational cost and is applicable to traditional feed - forward neural - network and graph - neural - network architectures. 3. **Validating the effectiveness of the new method**: - **Calibration and sharpness assessment**: The author evaluates the performance of the new method through two key metrics - calibration and sharpness. Calibration ensures the reliability of uncertainty estimates, while sharpness measures the width of the prediction interval. - **Testing performance on different datasets**: The new method has been tested on multiple benchmark datasets (such as MD17 - Aspirin, QM9, and OC20), demonstrating its applicability and robustness under different chemical complexities and training data volumes. 4. **Dealing with non - independent and identically distributed (Non - i.i.d.) data**: - **Exploring the limitations of the i.i.d. assumption**: The author also studies the performance of the new method when the data violates the independent and identically distributed assumption to evaluate its robustness in practical applications. ### Summary The main goal of this paper is to develop a reliable, efficient, and scalable uncertainty quantification method for energy prediction in machine - learning force fields. By combining conformal prediction and latent - space distance, the author provides a new method that does not require retraining the model and is not limited by error - distribution assumptions, thereby improving the accuracy and reliability of uncertainty estimates.