What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of excessive gradient variance in the training process of diffusion models. Specifically, the author focuses on score - based models, which are trained through denoising score matching (DSM). However, due to the high - variance nature of the training objective function, the optimization process becomes difficult. To solve this problem, the author proposes a control variate method based on Taylor expansion to reduce the variance of the training objective and its gradient. Specific contributions include: 1. **Derivation of control variates for Taylor polynomials of arbitrary order**: The author proposes a general framework that can derive Taylor polynomials of arbitrary order as control variates for the training objective function and its gradient. 2. **Proof of the equivalence of controlling the training objective and its gradient**: The author proves that there is an equivalence relationship between controlling the training objective and controlling its gradient, which provides a theoretical basis for future research. 3. **Empirical importance of regression coefficients**: The author shows the importance of regression coefficients for the effect of control variates through experiments. 4. **Validation of effectiveness in low - dimensional problem settings**: The author conducts empirical research in low - dimensional problems to verify the effectiveness of the proposed method. 5. **Study of the impact in high - dimensional problems**: The author explores the impact of control variates in high - dimensional problems and points out their limitations. 6. **Limitations of control variates based on Taylor expansion**: The author shows the limitations of Taylor expansion when dealing with complex networks, especially in the case of large noise values (σ). ### Formula summary - **Training objective function**: \[ L_\theta(z, x, \sigma)=\frac{1}{2}\left\|\frac{z}{\sigma}+s_\theta(x + \sigma z)\right\|^2 \] - **Control variate**: \[ C^k_\theta(z, x, \sigma)=\frac{\|z\|^2 - D}{2\sigma^2}+\frac{1}{2}\sum_{|\alpha|\leq k}\sum_{|\rho|\leq k}\frac{\sigma^{|\alpha|+|\rho|}}{\alpha!\rho!}\left(z^{\alpha+\rho}-\delta_{\alpha+\rho}\right)\partial^\alpha s_\theta(x)^T\partial^\rho s_\theta(x)+\sum_{|\alpha|\leq k}\frac{\sigma^{|\alpha|-1}}{\alpha!}\left(z^\alpha z^T - E[z^\alpha z]\right)\partial^\alpha s_\theta(x) \] - **Control variate for controlling the gradient**: \[ C^k_{g,\theta}(z, x, \sigma)=\sum_{|\rho|\leq k}\frac{\sigma^{|\rho|-1}}{\rho!}(z^\rho z - E[z^\rho z])^T\partial^\rho\partial_\theta s_\theta(x)+\sum_{|\rho|\leq k}\sum_{|\alpha|\leq k}\frac{\sigma^{|\alpha|+|\rho|}}{\alpha!\rho!}\left(z^{\alpha+\rho}\right)

Variance reduction of diffusion model's gradients with Taylor approximation-based control variate

Pathwise Gradient Variance Reduction with Control Variates in Variational Inference

Double Control Variates for Gradient Estimation in Discrete Latent Variable Models

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

Joint control variate for faster black-box variational inference

Neural Control Variates for Variance Reduction

Using Large Ensembles of Control Variates for Variational Inference

From optimal score matching to optimal sampling

Operator-informed score matching for Markov diffusion models

Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization

Approximation Based Variance Reduction for Reparameterization Gradients

Evaluating the design space of diffusion-based generative models

Gradient tracking and variance reduction for decentralized optimization and machine learning

To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models

Neural Control Variates

Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data

Adapting to Unknown Low-Dimensional Structures in Score-Based Diffusion Models

Conditional Variational Diffusion Models

Stochastic viscosity approximations of Hamilton-Jacobi equations and variance reduction

Statistical guarantees for denoising reflected diffusion models