Analytical results for uncertainty propagation through trained machine learning regression models

Andrew Thompson
2024-05-08
Abstract:Machine learning (ML) models are increasingly being used in metrology applications. However, for ML models to be credible in a metrology context they should be accompanied by principled uncertainty quantification. This paper addresses the challenge of uncertainty propagation through trained/fixed machine learning (ML) regression models. Analytical expressions for the mean and variance of the model output are obtained/presented for certain input data distributions and for a variety of ML models. Our results cover several popular ML models including linear regression, penalised linear regression, kernel ridge regression, Gaussian Processes (GPs), support vector machines (SVMs) and relevance vector machines (RVMs). We present numerical experiments in which we validate our methods and compare them with a Monte Carlo approach from a computational efficiency point of view. We also illustrate our methods in the context of a metrology application, namely modelling the state-of-health of lithium-ion cells based upon Electrical Impedance Spectroscopy (EIS) data
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the challenge of uncertainty propagation in machine learning (ML) regression models. Specifically, the study aims to characterize the uncertainty of ML model outputs given input variables and their corresponding uncertainties. The paper focuses on regression models, which are particularly common in metrology applications. The main contributions of the paper include: 1. Proposing analytical expressions for the output uncertainty of several popular machine learning models, such as linear regression, ridge regression, kernel ridge regression, support vector machines, and Gaussian processes. 2. In some cases, these results are novel and complement known results to provide as complete a representation as possible. 3. Validating the effectiveness of the methods through practical applications in modeling the state of health of lithium-ion batteries and comparing them with Monte Carlo sampling methods. 4. Exploring the computational efficiency trade-offs between analytical methods and Monte Carlo sampling methods, and finding that for a given model, below a certain accuracy threshold, Monte Carlo sampling methods become more computationally expensive. In summary, the focus of this paper is to provide an analytical framework for evaluating the uncertainty of outputs for fixed machine learning models, thereby offering more reliable results for metrology applications.