Abstract:Score-based diffusion models, which generate new data by learning to reverse a diffusion process that perturbs data from the target distribution into noise, have achieved remarkable success across various generative tasks. Despite their superior empirical performance, existing theoretical guarantees are often constrained by stringent assumptions or suboptimal convergence rates. In this paper, we establish a fast convergence theory for a popular SDE-based sampler under minimal assumptions. Our analysis shows that, provided $\ell_{2}$-accurate estimates of the score functions, the total variation distance between the target and generated distributions is upper bounded by $O(d/T)$ (ignoring logarithmic factors), where $d$ is the data dimensionality and $T$ is the number of steps. This result holds for any target distribution with finite first-order moment. To our knowledge, this improves upon existing convergence theory for both the SDE-based sampler and another ODE-based sampler, while imposing minimal assumptions on the target data distribution and score estimates. This is achieved through a novel set of analytical tools that provides a fine-grained characterization of how the error propagates at each step of the reverse process.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to establish a fast convergence theory for diffusion probability models under minimal assumptions. Specifically, the authors focus on samplers based on stochastic differential equations (SDEs). Under the target distribution with only finite first - order moments, by providing accurate score function estimates, they derive the upper bound of the total variation distance (TV distance) between the generated distribution and the target distribution. The key contributions of the paper are as follows:
1. **Fast Convergence Rate**: In the case of perfect score function estimates, the authors prove that the TV distance convergence rate of SDE - based samplers is \(O\left(\frac{d}{T}\right)\), which is a significant improvement over the previous best convergence rate of \(O\left(\sqrt{\frac{d}{T}}\right)\). Moreover, this result applies to arbitrary \(T\) and \(d\), not just the case where \(T \gg d^2\).
2. **Minimal Assumptions**: The theory only requires that the target distribution has a finite first - order moment, which is the weakest data assumption in the current literature. In addition, only \(\ell_2\)-accurate score function estimates are required, which is much weaker than the Jacobian matrix accuracy conditions required in other works.
3. **Stability**: When the score function estimates are not perfect, the theory shows that the performance of SDE - based samplers will gradually decline but still remain stable. In contrast, the convergence bounds of ODE - based samplers are more sensitive to imperfect score function estimates.
4. **Error Metric**: The theory provides a convergence guarantee for the TV distance between the generated distribution \(p_{X_1}\) and the target distribution \(p_{Y_1}\), rather than directly for the initial data distribution \(p_{X_0}\). Since the distributions of \(X_1\) and \(X_0\) are very close, \(TV(p_{X_1}, p_{Y_1})\) is an effective error metric.
### Formula Summary
- **Total Variation Distance**:
\[
TV(p_{X_1}, p_{Y_1}) \leq c \frac{d \log^3 T}{T} + c \epsilon_{\text{score}} \sqrt{\log T}
\]
- **Score Function Estimation Error**:
\[
\epsilon_{\text{score}}^2 = \frac{1}{T} \sum_{t = 1}^T E\left[\|s_t(X_t) - s^\star_t(X_t)\|_2^2\right]
\]
- **Main Convergence Theorem**:
\[
TV(p_{X_1}, p_{Y_1}) \leq c \frac{d \log^3 T}{T} + c \epsilon_{\text{score}} \sqrt{\log T}
\]
### Paper Structure
1. **Introduction**: Introduces the background and importance of score - generating models (SGMs), as well as the shortcomings of existing theories.
2. **Problem Setting**: Describes in detail the forward process and the reverse process, as well as the definition and estimation of the score function.
3. **Main Results**: Proposes a fast convergence theory and gives detailed convergence rates and assumption conditions.
4. **Proof**: By introducing auxiliary sequences and intermediate variables, gradually controls the discretization error and the estimation error, and finally proves the main theorem.
### Conclusion
Through rigorous mathematical analysis, this paper establishes a fast convergence theory under minimal assumptions, providing an important progress for the theoretical basis of diffusion models. This theory not only improves the convergence rate but also relaxes the requirements for data and score function estimates, making the theory more universal and practical.