Abstract:Optimal transport plays a fundamental role in deep learning. Natural data sets have intrinsic patterns, which can be summarized as the manifold distribution principle: a natural class of data can be treated as a probability distribution on a low-dimensional manifold, embedded in a high-dimensional ambient space. A deep learning system mainly accomplishes two tasks: manifold learning and probability distribution learning. Given a manifold X, all the probability measures on X form an infinite dimensional manifold, the so-calledWasserstein space. Optimal transport assigns a Riemannian metric on the Wasserstein space, the so-called Wasserstein metric, and defines Otto's calculus, such that variational optimization can be carried out in theWasserstein space P(X). A deep learning system learns the distribution by optimizing some functionals in the Wasserstein space P(X); therefore optimal transport lays down the theoretic foundation for deep learning. This work introduces the theory of optimal transport and the profound relation between Brenier's theorem and Alexandrov's theorem in differential geometry via Monge-Ampere equation. We give a variational proof for Alexandrov's theorem and convert the proof to a computational algorithm to solve the optimal transport maps. The algorithm is based on computational geometry and can be generalized to general manifold setting. Optimal transport theory and algorithms have been extensively applied in the models of generative adversarial networks (GANs). In a GAN model, the generator computes the optimal transport map (OT map), while the discriminator computes the Wasserstein distance between the generated data distribution and the real data distribution. The optimal transport theory shows the competition between the generator and the discriminator is completely unnecessary and should be replaced by collaboration. Furthermore, the regularity theory of optimal transport map explains the intrinsic reason for mode collapsing. A novel generative model is introduced, which uses an autoencoder (AE) for manifold learning and OT map for probability distribution transformation. This AE-OT model improves the theoretical rigor and transparency, as well as the computational stability and efficiency; in particular, it eliminates the mode collapsing.

Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization

DPVI: A Dynamic-Weight Particle-Based Variational Inference Framework

Moreau-Yoshida Variational Transport: A General Framework For Solving Regularized Distributional Optimization Problems

A Particle-Based Algorithm for Distributional Optimization on \textit{Constrained Domains} via Variational Transport and Mirror Descent

Pathwise Derivatives for Multivariate Distributions

Variational Analysis in the Wasserstein Space

An optimal-transport finite-particle method for mass diffusion

Covariance-modulated optimal transport and gradient flows

Generalized Variational Inference Via Optimal Transport

Unifying Distributionally Robust Optimization via Optimal Transport Theory

Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Scaling Algorithms for Unbalanced Transport Problems

Particle-based Variational Inference with Generalized Wasserstein Gradient Flow

Stochastic variance-reduced Gaussian variational inference on the Bures-Wasserstein manifold

Mean-field Variational Inference via Wasserstein Gradient Flow

Optimal Transport for Generative Models

Bridging the Gap Between Variational Inference and Wasserstein Gradient Flows

Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

Using Linearized Optimal Transport to Predict the Evolution of Stochastic Particle Systems

On Scalable and Efficient Computation of Large Scale Optimal Transport