Abstract:As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the low computational efficiency encountered during the training process of diffusion models. Specifically, as the data distribution becomes more and more complex, the training of diffusion models becomes very computationally intensive, resulting in overly long training times and huge resource consumption. For example, training an image - generation model may require thousands of GPU hours, which poses a major obstacle to the advancement of generative AI applications and also has a negative impact on the environment. To solve this problem, the author proposes a non - uniform timestep sampling method to accelerate the training process of diffusion models. Traditionally, diffusion models are usually trained using uniform timestep sampling, but this method has a high gradient variance at certain timesteps, creating a bottleneck that hinders faster convergence. Therefore, the author introduces a method of adaptively selecting key timesteps. By tracking the impact of gradient updates at each timestep on the objective function, those timesteps that are most likely to effectively reduce the objective function are preferentially selected. ### Core contributions of the paper: 1. **Gradient variance analysis**: Through the analysis of the gradient variance in the training of diffusion models, it is explained why non - uniform timestep training can accelerate convergence. 2. **Learning - based sampling method**: Different from previous heuristic - based acceleration methods, this paper proposes a learning - based method that adaptively samples timesteps to minimize the variational lower bound. 3. **Experimental verification**: Through experiments with multiple image datasets, noise scheduling strategies, and diffusion model architectures, the robustness and superior performance of this method are demonstrated. ### Method overview: - **Gradient variance imbalance**: The author observes that there are significant differences in the variance of random gradients at different timesteps, especially with a high variance at early timesteps, which may cause these timesteps to become bottlenecks in training. - **Adaptive sampling algorithm**: To address this problem, the author designs an adaptive non - uniform timestep sampling algorithm, which can dynamically adjust the sampling frequency according to the gradient variance at each timestep, thereby optimizing the objective function more effectively. - **Experimental results**: Experiments show that this method not only accelerates the training process but also performs better at convergence, has strong robustness, and is suitable for various datasets and model architectures. In conclusion, this paper solves the computational efficiency problem in the training of diffusion models by introducing a new non - uniform timestep sampling method, significantly improving the training speed and the performance of the final model.

Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

Non-uniform Timestep Sampling: Towards Faster Diffusion Model Training

Accelerating Parallel Sampling of Diffusion Models

Accelerating Convergence of Score-Based Diffusion Models, Provably

Stochastic Runge-Kutta Methods: Provable Acceleration of Diffusion Models

A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models

Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models

Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time

Non-Uniform Diffusion Models

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis

Choose Your Diffusion: Efficient and flexible ways to accelerate the diffusion model in fast high energy physics simulation

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

Analyzing and Improving the Training Dynamics of Diffusion Models

A Geometric Perspective on Diffusion Models

Fast constrained sampling in pre-trained diffusion models

New algorithms for sampling and diffusion models

AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation