Abstract:Denoising diffusion models have emerged as a dominant approach for image generation, however they still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm of diffusion models. Specifically, we offer theoretical insights that the prevailing constant loss weight strategy in $\epsilon$-prediction of diffusion models leads to biased estimation during the training phase, hindering accurate estimations of original images. To address the issue, we propose a simple but effective weighting strategy derived from the unlocked biased part. Furthermore, we conduct a comprehensive and systematic exploration, unraveling the inherent bias problem in terms of its existence, impact and underlying reasons. These analyses contribute to advancing the understanding of diffusion models. Empirical results demonstrate that our method remarkably elevates sample quality and displays improved efficiency in both training and sampling processes, by only adjusting loss weighting strategy. The code is released publicly at \url{<a class="link-external link-https" href="https://github.com/yuhuUSTC/Debias" rel="external noopener nofollow">this https URL</a>}
What problem does this paper attempt to address?
This paper attempts to solve two main problems existing in the training process of Diffusion Models: **slow convergence speed** and **color shift problem during sampling**. Specifically, the author finds that these problems are mainly caused by the inherent bias and sub - optimality in the default training paradigm of diffusion models. To explain this in more detail, the paper conducts the following discussions:
1. **Identifying the sources of bias**:
- The author points out that in the traditional noise prediction ($\epsilon$-prediction) based on constant weights, the design of the loss function will lead to estimation bias in the training stage, thus affecting the accurate estimation of the original image.
- This bias is specifically manifested as that with the increase of the training step $t$, the estimated $\hat{x_0}$ gradually deviates from the real $x_0$, and the amplified error part gradually approaches $x_0$.
2. **Proposing improvement schemes**:
- To solve the above problems, the author proposes a simple but effective weighting strategy, that is, using the reciprocal of the square root of the signal - to - noise ratio (SNR) as the weight coefficient of the loss function:
\[
L=\sum_{t}\mathbb{E}_{x_0,\epsilon}\left[\frac{1}{\sqrt{\text{SNR}(t)}}\|\epsilon - \epsilon_\theta(x_t,t)\|^2\right]
\]
- By adjusting the loss weight, the error at a higher noise level can be more significantly reduced, thereby improving the sample quality and training efficiency.
3. **Systematically analyzing the bias problem**:
- The author systematically analyzes the bias problem from multiple perspectives, including its existence, influence, and root causes.
- Research shows that the optimization difficulty and importance of the denoising network at different steps $t$ vary greatly, especially in the initial steps, where the high noise level leads to greater optimization challenges.
- In addition, the bias estimation problem will cause confusion and inconsistency in the first few steps during the sampling process, and then affect the final generation result through error propagation.
4. **Experimental verification**:
- The experimental results show that the proposed weighting strategy not only significantly improves the sample quality but also shows higher efficiency in both the training and sampling processes.
- Compared with the existing weighting strategies, the new method can achieve better performance with fewer iteration times and sampling steps.
In summary, through theoretical analysis and experimental verification, this paper shows that the constant - weight strategy in the training of traditional diffusion models will lead to bias problems, and proposes a new weighting strategy to solve these problems, thereby improving the performance and efficiency of diffusion models.