Learning Quantized Adaptive Conditions for Diffusion Models

Yuchen Liang,Yuchuan Tian,Lei Yu,Huao Tang,Jie Hu,Xiangzhong Fang,Hanting Chen
2024-09-26
Abstract:The curvature of ODE trajectories in diffusion models hinders their ability to generate high-quality images in a few number of function evaluations (NFE). In this paper, we propose a novel and effective approach to reduce trajectory curvature by utilizing adaptive conditions. By employing a extremely light-weight quantized encoder, our method incurs only an additional 1% of training parameters, eliminates the need for extra regularization terms, yet achieves significantly better sample quality. Our approach accelerates ODE sampling while preserving the downstream task image editing capabilities of SDE techniques. Extensive experiments verify that our method can generate high quality results under extremely limited sampling costs. With only 6 NFE, we achieve 5.14 FID on CIFAR-10, 6.91 FID on FFHQ 64x64 and 3.10 FID on AFHQv2.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to reduce the curvature of ODE trajectories in diffusion models in order to improve the image generation quality with a limited number of function evaluations (NFE). Specifically, the authors propose a novel and effective method - using Adaptive Conditions, by introducing an extremely lightweight quantization encoder to reduce the trajectory curvature. This method not only significantly improves the sample quality but also accelerates the ODE sampling process while retaining the ability of score - based models in downstream task image editing. ### Core Contributions of the Paper 1. **Explore the Relationship between the Degree of Forward Flow Intersection and the Quality of Few - step Sampling**: The authors provide theoretical support to prove the positive correlation between the degree of forward flow intersection and the quality of few - step sampling. 2. **Propose a Plug - in Method**: This method can reduce the degree of forward flow intersection with only a small additional cost during the training process and is the first such method that does not require trajectory relocation and additional regularization. 3. **Extensive Experimental Verification**: The authors conduct a large number of comparison and ablation experiments on the CIFAR - 10, MNIST, FFHQ and AFHQv2 datasets to verify the superior performance of this method in few - step sampling and full - sampling generation. ### Method Overview - **Adaptive Conditions**: By using adaptive conditions (Y), the forward trajectories are distinguished. These conditions can be regarded as pseudo - labels of image data \(X_0\) and are independent of noise \(X_1\). - **Quantization Condition Encoder**: Use a quantization encoder to avoid the posterior collapse problem and can effectively handle high - resolution image reconstruction. - **Online Sampling Weight Collection**: Two sampling weight collection strategies are proposed, among which the online collection strategy shows significantly better performance in few - step generation and full - sampling generation. ### Experimental Results - **Effect of the Quantization Condition Encoder**: The experimental results of different condition encoders show that the quantized adaptive conditions can significantly improve the quality of generated images, especially in few - step generation. - **Comparison with Existing Methods**: Experiments on the CIFAR - 10, FFHQ and AFHQv2 datasets show that this method outperforms the baseline methods and other accelerated sampling techniques in both few - step generation and full - sampling generation. - **Zero - sample Image Editing**: This method is also applicable to various image editing tasks, such as super - resolution, coloring and inpainting. ### Conclusion The authors show that the degree of forward flow intersection directly affects the generation performance of few - step sampling and propose an efficient and plug - in method to reduce the average reconstruction loss with a small additional training cost. This method not only retains the key properties of score - based models but also is complementary to other acceleration methods and significantly improves the sample quality under a limited sampling budget.