Efficient hyperparameter estimation in Bayesian inverse problems using sample average approximation

Julianne Chung,Scot M. Miller,Malena Sabate Landman,Arvind K. Saibaba
2024-12-04
Abstract:In Bayesian inverse problems, it is common to consider several hyperparameters that define the prior and the noise model that must be estimated from the data. In particular, we are interested in linear inverse problems with additive Gaussian noise and Gaussian priors defined using Matérn covariance models. In this case, we estimate the hyperparameters using the maximum a posteriori (MAP) estimate of the marginalized posterior distribution. However, this is a computationally intensive task since it involves computing log determinants. To address this challenge, we consider a stochastic average approximation (SAA) of the objective function and use the preconditioned Lanczos method to compute efficient approximations of the function and gradient evaluations. We propose a new preconditioner that can be updated cheaply for new values of the hyperparameters and an approach to compute approximations of the gradient evaluations, by reutilizing information from the function evaluations. We demonstrate the performance of our approach on static and dynamic seismic tomography problems.
Numerical Analysis
What problem does this paper attempt to address?
This paper attempts to solve the problem of efficiently estimating hyper - parameters in Bayesian inverse problems. Specifically, the authors focus on linear inverse problems, where the noise and prior are both assumed to be Gaussian distributions and are defined using the Matérn covariance model. In this case, the authors estimate the hyper - parameters by maximizing the MAP (Maximum A Posteriori) estimate of the marginal posterior distribution. However, this process is computationally intensive because it involves calculating the log - determinant. To address this challenge, the authors propose a stochastic approximation method of the objective function based on Sample Average Approximation (SAA) and use the preconditioned Lanczos method to efficiently calculate function values and gradient evaluations. ### Main problems and solutions 1. **Problem description**: - In Bayesian inverse problems, multiple hyper - parameters defining the prior and noise models need to be estimated from data. - For linear inverse problems, assume that the noise and prior are both Gaussian distributions and use the Matérn covariance model. - Estimating these hyper - parameters usually requires calculating the log - determinant, which is a computationally intensive task. 2. **Proposed solutions**: - **Sample Average Approximation (SAA)**: Approximate the objective function by introducing random variables and using the Monte Carlo method. - **Preconditioned Lanczos method**: Used to efficiently calculate the approximate value of the log - determinant and its gradient. - **New preconditioner**: Can be updated cheaply to adapt to new hyper - parameter values and does not require access to the forward/adjoint solver. - **Gradient approximation**: Calculate the gradient with almost no additional forward/adjoint applications by reusing the information in the objective function calculation. ### Formula summary - **Objective function**: \[ F(\theta)=-\log\pi_{\text{hyp}}(\theta)+\frac{1}{2}\log\det(\Psi(\theta))+\frac{1}{2}\|A\mu(\theta)-d\|^{2}_{\Psi^{-1}(\theta)} \] where \[ \Psi(\theta)=AQ(\theta)A^{\top}+R(\theta) \] - **Gradient formula**: \[ (\nabla F(\theta))_{i}=-\frac{1}{\pi_{\text{hyp}}(\theta)}\frac{\partial\pi_{\text{hyp}}(\theta)}{\partial\theta_{i}}+\frac{1}{2}\text{trace}\left(\Psi(\theta)^{-1}\frac{\partial\Psi(\theta)}{\partial\theta_{i}}\right)-\frac{1}{2}\left[\Psi(\theta)^{-1}(A\mu(\theta)-d)\right]^{\top}\left[\frac{\partial\Psi(\theta)}{\partial\theta_{i}}\Psi(\theta)^{-1}(A\mu(\theta)-d)-2A\frac{\partial\mu(\theta)}{\partial\theta_{i}}\right] \] ### Conclusion This paper proposes a new method based on sample average approximation and the preconditioned Lanczos method to efficiently estimate hyper - parameters in Bayesian inverse problems. This method significantly reduces the complexity of calculating the log - determinant and further reduces the cost of gradient calculation by reusing the information in the objective function calculation. Experimental results show that this method performs well on static and dynamic seismic tomography problems.