AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

Lijiang Li,Huixia Li,Xiawu Zheng,Jie Wu,Xuefeng Xiao,Rui Wang,Min Zheng,Xin Pan,Fei Chao,Rongrong Ji
2023-09-23
Abstract:Diffusion models are emerging expressive generative models, in which a large number of time steps (inference steps) are required for a single image generation. To accelerate such tedious process, reducing steps uniformly is considered as an undisputed principle of diffusion models. We consider that such a uniform assumption is not the optimal solution in practice; i.e., we can find different optimal time steps for different models. Therefore, we propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Specifically, we first design a unified search space that consists of all possible time steps and various architectures. Then, a two stage evolutionary algorithm is introduced to find the optimal solution in the designed search space. To further accelerate the search process, we employ FID score between generated and real samples to estimate the performance of the sampled examples. As a result, the proposed method is (i).training-free, obtaining the optimal time steps and model architecture without any training process; (ii). orthogonal to most advanced diffusion samplers and can be integrated to gain better sample quality. (iii). generalized, where the searched time steps and architectures can be directly applied on different diffusion models with the same guidance scale. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $\times$ 64 with only four steps, compared to 138.66 with DDIM. The code is available at <a class="link-external link-https" href="https://github.com/lilijiangg/AutoDiffusion" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is that diffusion models require a large number of time steps (inference steps) to generate images, resulting in a very slow generation process. To accelerate this process, existing methods typically reduce the number of time steps, but these methods often use uniform reduction or specific function reduction, ignoring the fact that the optimal time step sequence may vary under different tasks and hyperparameters. Therefore, the paper proposes a new framework, AutoDiffusion, which aims to automatically search for the optimal time step sequence and compressed model architecture of pre-trained diffusion models without additional training, thereby achieving efficient image generation. Specifically, the paper proposes the following points: 1. **Problem Background**: Diffusion models perform excellently in generating high-quality samples, but the generation process is very time-consuming. Existing acceleration methods mainly focus on reducing the number of time steps, but these methods often do not consider the issue of selecting the optimal time step sequence. 2. **Research Motivation**: The authors believe that for a given diffusion model, there exists an optimal time step sequence, which varies with different tasks and model hyperparameters. By selecting the optimal time step sequence, the generation quality can be improved. 3. **Solution**: The paper proposes a new framework called AutoDiffusion, which combines Neural Architecture Search (NAS) technology and designs a unified search space that includes all possible time step sequences and various model architectures. Through a two-stage evolutionary algorithm, AutoDiffusion can find the optimal time step sequence and model architecture without additional training. 4. **Experimental Results**: Experimental results show that the AutoDiffusion method can significantly improve the quality of generated images with a small number of time steps and can further enhance performance when combined with existing advanced samplers. In summary, the main contribution of this paper is to reveal the shortcomings of uniform sampling or using fixed functions to select time steps and propose a new framework to automatically search for the optimal time step sequence and model architecture, thereby significantly improving the generation speed and quality of diffusion models without increasing training costs.