Abstract:Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM. The code is available at <a class="link-external link-https" href="https://github.com/lilijiangg/AutoDiffusion" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem this paper attempts to address is that diffusion models require a large number of time steps (inference steps) to generate images, resulting in a very slow generation process. To accelerate this process, existing methods typically reduce the number of time steps, but these methods often use uniform reduction or specific function reduction, ignoring the fact that the optimal time step sequence may vary under different tasks and hyperparameters. Therefore, the paper proposes a new framework, AutoDiffusion, which aims to automatically search for the optimal time step sequence and compressed model architecture of pre-trained diffusion models without additional training, thereby achieving efficient image generation. Specifically, the paper proposes the following points: 1. **Problem Background**: Diffusion models perform excellently in generating high-quality samples, but the generation process is very time-consuming. Existing acceleration methods mainly focus on reducing the number of time steps, but these methods often do not consider the issue of selecting the optimal time step sequence. 2. **Research Motivation**: The authors believe that for a given diffusion model, there exists an optimal time step sequence, which varies with different tasks and model hyperparameters. By selecting the optimal time step sequence, the generation quality can be improved. 3. **Solution**: The paper proposes a new framework called AutoDiffusion, which combines Neural Architecture Search (NAS) technology and designs a unified search space that includes all possible time step sequences and various model architectures. Through a two-stage evolutionary algorithm, AutoDiffusion can find the optimal time step sequence and model architecture without additional training. 4. **Experimental Results**: Experimental results show that the AutoDiffusion method can significantly improve the quality of generated images with a small number of time steps and can further enhance performance when combined with existing advanced samplers. In summary, the main contribution of this paper is to reveal the shortcomings of uniform sampling or using fixed functions to select time steps and propose a new framework to automatically search for the optimal time step sequence and model architecture, thereby significantly improving the generation speed and quality of diffusion models without increasing training costs.

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

AdaDiff: Adaptive Step Selection for Fast Diffusion.

AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation

FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models

PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future

Relational Diffusion Distillation for Efficient Image Generation

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner

Accelerated Image-Aware Generative Diffusion Modeling

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

DeepCache: Accelerating Diffusion Models for Free

Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

Accelerating Parallel Sampling of Diffusion Models

Analyzing and Improving the Training Dynamics of Diffusion Models

Dynamic Diffusion Transformer

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

$Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers

Accelerating Diffusion Sampling with Optimized Time Steps