Abstract:Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon^2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.

A convergence result of a continuous model of deep learning via Łojasiewicz--Simon inequality

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Convergence and non-convergence in a nonlocal gradient flow

Convergence Analysis of a Class of Nonsmooth Gradient Systems.

Convergence of stochastic gradient descent under a local Lojasiewicz condition for deep neural networks

Convergence analysis of OT-Flow for sample generation

Convergence analysis of discrete time recurrent neural networks for linear variational inequality problem

Convergence Analysis of Gradient Algorithms on Riemannian Manifolds Without Curvature Constraints and Application to Riemannian Mass

Convergence of continuous-time stochastic gradient descent with applications to linear deep neural networks

A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks

On the global convergence of Wasserstein gradient flow of the Coulomb discrepancy

Gradient flows on graphons: existence, convergence, continuity equations

On Convergence of Training Loss Without Reaching Stationary Points

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

Local convergence rates for Wasserstein gradient flows and McKean-Vlasov equations with multiple stationary solutions

A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise

Adversarial flows: A gradient flow characterization of adversarial attacks

On Non-local Convergence Analysis of Deep Linear Networks.

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space