Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows

Yifan Chen,Daniel Zhengyu Huang,Jiaoyang Huang,Sebastian Reich,Andrew M. Stuart
2024-10-11
Abstract:In this paper, we study efficient approximate sampling for probability distributions known up to normalization constants. We specifically focus on a problem class arising in Bayesian inference for large-scale inverse problems in science and engineering applications. The computational challenges we address with the proposed methodology are: (i) the need for repeated evaluations of expensive forward models; (ii) the potential existence of multiple modes; and (iii) the fact that gradient of, or adjoint solver for, the forward model might not be feasible. While existing Bayesian inference methods meet some of these challenges individually, we propose a framework that tackles all three systematically. Our approach builds upon the Fisher-Rao gradient flow in probability space, yielding a dynamical system for probability densities that converges towards the target distribution at a uniform exponential rate. This rapid convergence is advantageous for the computational burden outlined in (i). We apply Gaussian mixture approximations with operator splitting techniques to simulate the flow numerically; the resulting approximation can capture multiple modes thus addressing (ii). Furthermore, we employ the Kalman methodology to facilitate a derivative-free update of these Gaussian components and their respective weights, addressing the issue in (iii). The proposed methodology results in an efficient derivative-free sampler flexible enough to handle multi-modal distributions: Gaussian Mixture Kalman Inversion (GMKI). The effectiveness of GMKI is demonstrated both theoretically and numerically in several experiments with multimodal target distributions, including proof-of-concept and two-dimensional examples, as well as a large-scale application: recovering the Navier-Stokes initial condition from solution data at positive times.
Machine Learning,Dynamical Systems,Numerical Analysis
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to efficiently perform approximate sampling on probability distributions with known normalization constants in Bayesian inference, especially in applications to large - scale inverse problems. Specifically, the type of problem that the paper focuses on is Bayesian inference for large - scale inverse problems arising in scientific and engineering applications. The author points out that although existing Bayesian inference methods can meet these challenges to a certain extent, none of them can systematically solve the following three main computational difficulties: 1. **Expensive forward model evaluation**: In many practical problems, the evaluation of the forward model is very time - consuming and needs to be executed repeatedly, which greatly increases the computational burden. 2. **Existence of multimodality**: The target distribution may have multiple modes, which makes many existing methods slow and inefficient when exploring different modes. 3. **Infeasibility of gradient or adjoint solvers**: In some cases, the gradient or adjoint solver of the forward model may not be available or the computational cost is too high. To meet these challenges, the author proposes a new framework, namely Gaussian Mixture Kalman Inversion (GMKI). This method combines Fisher - Rao gradient flow, Gaussian mixture approximation and Kalman methods, and can converge rapidly without relying on gradient information and can capture the characteristics of multimodal distributions. Specifically, GMKI solves the above problems in the following ways: - **Fisher - Rao gradient flow**: Use Fisher - Rao gradient flow to construct a dynamical system in the probability space. This system converges to the target distribution at a uniform exponential rate, thereby effectively reducing the computational burden. - **Gaussian mixture approximation**: Use a Gaussian mixture model to approximate the target distribution, so as to be able to capture the characteristics of multimodal distributions. - **Kalman method**: Adopt the Kalman method to update each Gaussian component and its weight in the Gaussian mixture model to achieve gradient - free updates. Through the combination of these techniques, GMKI provides an efficient posterior approximation method, which is especially suitable for complex inverse problems where forward model evaluation is expensive, multimodality exists and gradient information is unavailable. The paper verifies the effectiveness of GMKI through theoretical analysis and numerical experiments, and shows its advantages in dealing with multimodal target distributions and large - scale inverse problems.