Abstract:Diffusion models are a class of generative models that serve to establish a stochastic transport map between an empirically observed, yet unknown, target distribution and a known prior. Despite their remarkable success in real-world applications, a theoretical understanding of their generalization capabilities remains underdeveloped. This work embarks on a comprehensive theoretical exploration of the generalization attributes of diffusion models. We establish theoretical estimates of the generalization gap that evolves in tandem with the training dynamics of score-based diffusion models, suggesting a polynomially small generalization error ($O(n^{-2/5}+m^{-4/5})$) on both the sample size $n$ and the model capacity $m$, evading the curse of dimensionality (i.e., not exponentially large in the data dimension) when early-stopped. Furthermore, we extend our quantitative analysis to a data-dependent scenario, wherein target distributions are portrayed as a succession of densities with progressively increasing distances between modes. This precisely elucidates the adverse effect of "modes shift" in ground truths on the model generalization. Moreover, these estimates are not solely theoretical constructs but have also been confirmed through numerical simulations. Our findings contribute to the rigorous understanding of diffusion models' generalization properties and provide insights that may guide practical applications.

What problem does this paper attempt to address?

The paper mainly discusses the generalization properties of diffusion models, which are generative models used to establish a stochastic transport mapping between known priors and unknown target distributions. Although these models perform well in real-world applications, their theoretical understanding of generalization capability is still limited. The main contributions of the paper include: 1. Providing an upper bound estimate for the generalization gap during the training dynamics of diffusion models. This result shows that the generalization error of diffusion models can be polynomially small (O(n^(-2/5)+m^(-4/5))) in terms of sample size n and model capacity m through early stopping, avoiding the curse of dimensionality where the generalization error exponentially grows as the data dimension increases. 2. Extending the above quantitative analysis to data-dependent scenarios and investigating the case where the target distribution changes with the increasing distance between patterns. This reveals the negative impact of "pattern transfer" on the generalization capability of models. 3. The estimations in the paper are not only restricted to theory but also validated through numerical simulations, providing rigorous mathematical understanding and practical guidance for the generalization properties of diffusion models. The motivation for this research lies in both theoretical and practical aspects: theoretically, it is necessary to understand whether diffusion models will suffer from overfitting similar to other models; in practice, the generalization capability of models is related to privacy and copyright risks since the models may leak training sample information and be vulnerable to specific attacks. The structure of the paper includes related work, problem definition, main results, and experimental verification, with detailed explanations of forward perturbation, backward sampling processes, loss objectives, and training dynamics, as well as relevant theorems and proofs.

On the Generalization Properties of Diffusion Models

On the Generalization of Diffusion Model

Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure

The Emergence of Reproducibility and Generalizability in Diffusion Models

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

From memorization to generalization: a theoretical framework for diffusion-based generative models

Diffusion Models: A Comprehensive Survey of Methods and Applications

Where to Diffuse, How to Diffuse, and How to Get Back: Automated Learning for Multivariate Diffusions

Towards a Mechanistic Explanation of Diffusion Model Generalization

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Dynamical Regimes of Diffusion Models

Scaling Riemannian Diffusion Models

Lecture Notes in Probabilistic Diffusion Models

Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization

Improved Convergence Rate for Diffusion Probabilistic Models

$O(d/T)$ Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions

Transfer Learning for Diffusion Models

Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models

Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models

Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory