Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models

Gen Li,Yuting Wei,Yuxin Chen,Yuejie Chi
2024-03-07
Abstract:Diffusion models, which convert noise into new data instances by learning to reverse a Markov diffusion process, have become a cornerstone in contemporary generative modeling. While their practical power has now been widely recognized, the theoretical underpinnings remain far from mature. In this work, we develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functions. For a popular deterministic sampler (based on the probability flow ODE), we establish a convergence rate proportional to $1/T$ (with $T$ the total number of steps), improving upon past results; for another mainstream stochastic sampler (i.e., a type of the denoising diffusion probabilistic model), we derive a convergence rate proportional to $1/\sqrt{T}$, matching the state-of-the-art theory. Imposing only minimal assumptions on the target data distribution (e.g., no smoothness assumption is imposed), our results characterize how $\ell_2$ score estimation errors affect the quality of the data generation processes. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach without resorting to toolboxes for SDEs and ODEs. Further, we design two accelerated variants, improving the convergence to $1/T^2$ for the ODE-based sampler and $1/T$ for the DDPM-type sampler, which might be of independent theoretical and empirical interest.
Machine Learning,Information Theory,Statistics Theory
What problem does this paper attempt to address?
The paper attempts to address the issue of non-asymptotic convergence analysis in diffusion models when generating new data instances. Specifically, the paper focuses on the following points: 1. **Non-Asymptotic Convergence Guarantees**: - For a popular deterministic sampler (based on probability flow ODE), the paper proves that the required number of steps is related to the error ε as \(O(\frac{1}{\epsilon})\) (apart from other polynomial dimension dependencies). This is a significant improvement over previous results. - For another mainstream stochastic sampler (i.e., DDPM-type sampler), the paper establishes an iterative complexity of \(O(\frac{1}{\epsilon^2})\) through a new non-asymptotic analysis framework, which matches existing theories. 2. **Considering Score Estimation Errors**: - The theoretical analysis of the deterministic sampler in the paper shows that the total variation distance (TV distance) is proportional to the score estimation error and the associated mean Jacobian error. This is the first research result that considers score estimation errors for such deterministic samplers in discrete time. 3. **Fundamental Non-Asymptotic Analysis Framework**: - From a technical perspective, the paper proposes a fully non-asymptotic analysis framework that directly deals with discrete-time processes without relying on tools from continuous limits. This approach does not require knowledge of SDE or ODE, thereby lowering the technical barrier to understanding diffusion models and providing a more general framework. 4. **Accelerating the Data Generation Process**: - To further speed up the sampling process, the paper develops two accelerated variants corresponding to deterministic and stochastic samplers, utilizing a small amount of additional estimated values. These variants achieve faster convergence rates, with the deterministic variant and stochastic variant reaching precision levels of \(O(\frac{1}{\sqrt{\epsilon}})\) and \(O(\frac{1}{\epsilon})\) respectively. In summary, the paper aims to better understand and improve the performance of diffusion models in generation tasks by establishing non-asymptotic theories, providing new insights particularly in the areas of score estimation errors and accelerated algorithms.