Petr Mokrov,Alexander Korotin,Lingxiao Li,Aude Genevay,Justin Solomon,Evgeny Burnaev
Abstract:Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated to each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Unlike previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate our algorithm's performance by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to efficiently approximate Wasserstein gradient flows in high - dimensional spaces. Specifically, the paper focuses on how to use input convex neural networks (ICNNs) to approximate Wasserstein gradient flows, especially in machine - learning applications. Wasserstein gradient flows provide a powerful tool for understanding many diffusion equations, especially the Fokker - Planck equation, which describes the diffusion process of probability measures. These processes can be understood as the gradient descent of the entropy functional in the Wasserstein space. However, the traditional JKO scheme (Jordan - Kinderlehrer - Otto scheme) needs to solve an optimization problem involving the Wasserstein distance at each step, which poses a serious computational challenge.
### Main contributions of the paper:
1. **Proposed a scalable method**: By using ICNNs to discretize JKO steps, this method can be optimized by stochastic gradient descent without the need to discretize the domain or perform particle simulations.
2. **Avoided domain discretization**: Unlike traditional methods, this method does not need to discretize the space, so it can handle higher - dimensional problems.
3. **Able to sample and calculate density**: At each time step, this method can not only sample from the measure but also calculate its probability density.
4. **Wide application**: The paper shows the performance of this algorithm in calculating the diffusion processes following the Fokker - Planck equation and applies it to unnormalized density sampling and non - linear filtering.
### Specific technical details:
- **Wasserstein gradient flow**: The paper first reviews the basic concepts of Wasserstein gradient flow, especially the relationship between the Fokker - Planck equation and Wasserstein gradient flow.
- **JKO scheme**: The traditional JKO scheme approximates the continuous gradient flow through time discretization, but a complex optimization problem needs to be solved at each step.
- **Application of ICNNs**: The paper proposes to use ICNNs to parameterize the optimal transport map, thereby avoiding direct calculation of the Wasserstein distance. Through Brenier's theorem, the optimization problem can be transformed into the optimization of a convex function.
- **Stochastic optimization**: The parameters of the ICNN are optimized by stochastic gradient descent (SGD), enabling the method to handle large - scale data.
- **Density calculation**: The paper also discusses in detail how to calculate the density of the measure at each time step, which is very important for many practical applications.
### Experimental results:
- **Convergence to the steady - state solution**: The experimental results show that this method can effectively converge to the steady - state solution in different dimensions and performs better than the traditional particle simulation method in high - dimensional cases.
- **Ornstein - Uhlenbeck process**: In the Ornstein - Uhlenbeck process, this method can accurately approximate the dynamics of the process, especially in high - dimensional cases.
- **Bayesian logistic regression**: The paper shows the application of this method in Bayesian logistic regression, especially the effectiveness in sampling the unnormalized posterior distribution.
- **Non - linear filtering**: Finally, the paper shows the application of this method in non - linear filtering and can effectively calculate the predictive distribution given the observed data.
In conclusion, this paper proposes an efficient and scalable method to approximate Wasserstein gradient flows, which is especially suitable for machine - learning tasks in high - dimensional spaces.