Abstract:Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is **how to sample efficiently from a known but unnormalized probability distribution in computational science and engineering**. Specifically, the paper focuses on designing gradient - flow - based methods to solve this problem. These methods can effectively handle the unknown normalization constant in the target distribution and can be more simple and efficient in numerical implementation. ### Main contributions of the paper 1. **Unique properties of KL divergence**: - It is proved that among all f - divergences, KL divergence, as an energy functional, has a unique property, that is, its corresponding gradient flow does not depend on the normalization constant of the target distribution. This property makes KL divergence an ideal choice of energy functional in the sampling problem because there is no need to know the normalization constant explicitly. 2. **Choice of metric and invariance**: - The influence of different metric choices on the gradient flow is studied, especially from the perspective of invariance. The Fisher - Rao metric is proved to be the only (in the sense of scaling) diffeomorphism - invariant metric. In addition, a weaker affine invariance is introduced, and various affine - invariant Wasserstein and Stein gradient flows are constructed. These affine - invariant gradient flows show better performance when dealing with highly anisotropic distributions. 3. **Gradient flow under Gaussian approximation**: - The method of efficiently implementing the gradient flow through Gaussian approximation is studied, which provides an alternative to particle methods. Specifically, Gaussian approximation is achieved through either metric projection or moment closure methods, and it is proved that these two methods are equivalent under certain conditions. In addition, the relationship between the Gaussian - approximation gradient flow and the gradient method in parametric variational inference is explored, and their convergence properties are analyzed. ### Paper structure - **Part 1**: Introduction, introducing the sampling problem and its importance, as well as the basic concepts of the gradient flow method. - **Part 2**: Introduction of the general form of the gradient flow in the probability density space. - **Part 3**: Discussion of the choice of energy functional, with special emphasis on the unique properties of KL divergence. - **Part 4**: Discussion of the choice of metric, focusing on the Fisher - Rao metric and affine invariance. - **Part 5**: Research on the gradient flow under Gaussian approximation, proposing efficient algorithms and analyzing their convergence properties. - **Part 6**: Application examples, demonstrating the effectiveness of the proposed method in PDE - constrained Bayesian inverse problems. - **Part 7**: Conclusion, summarizing the main findings and future research directions. ### Key formulas - **KL divergence**: \[ E(\rho; \rho_{\text{post}}) = \text{KL}[\rho \| \rho_{\text{post}}] = \int \rho \log \left( \frac{\rho}{\rho_{\text{post}}} \right) d\theta \] - **Gradient flow equation**: \[ \frac{\partial \rho_t}{\partial t} = -M(\rho_t)^{-1} \frac{\delta E}{\delta \rho} \bigg|_{\rho = \rho_t} \] - **Wasserstein gradient flow**: \[ \frac{\partial \rho_t}{\partial t} = -\nabla_\theta \cdot (\rho_t \nabla_\theta \log \rho_{\text{post}}) + \nabla_\theta \cdot (\nabla_\theta \rho_t) \] - **Fisher - Rao metric**: \[ g_\rho(\sigma_1, \sigma_2) = \langle M(\rho) \sigma_1, \sigma_2 \rangle = \int \sigma_1 \sigma_2 \rho d\theta \] ### Conclusion This paper...

Sampling via Gradient Flows in the Space of Probability Measures

Efficient, Multimodal, and Derivative-Free Bayesian Inference With Fisher-Rao Gradient Flows

Stochastic Normalizing Flows

Fisher-Rao Gradient Flow: Geodesic Convexity and Functional Inequalities

Large-Scale Wasserstein Gradient Flows

Accelerated Information Gradient flow

Sampling Via Föllmer Flow

Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry

Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem

Importance Sampling With Stochastic Particle Flow and Diffusion Optimization

Sampling with flows, diffusion and autoregressive neural networks: A spin-glass perspective

Dynamical Sampling With Langevin Normalization Flows

Functional Gradient Flows for Constrained Sampling

Sampling with flows, diffusion, and autoregressive neural networks from a spin-glass perspective

A probabilistic method for gradient estimates of some geometric flows

Kernel Approximation of Fisher-Rao Gradient Flows

Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Solving Fredholm Integral Equations of the First Kind via Wasserstein Gradient Flows

A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models