Temporal Dynamic Quantization for Diffusion Models

Junhyuk So,Jungwon Lee,Daehyun Ahn,Hyungjun Kim,Eunhyeok Park
2023-12-12
Abstract:The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problems of high storage and computational requirements when deploying diffusion models on mobile devices. Due to their excellent generation performance and versatility, diffusion models are becoming more and more popular in visual applications. However, the high storage and computational requirements caused by model size and iterative generation limit their use on resource - constrained mobile devices. Existing quantization techniques are difficult to maintain performance at 8 - bit precision because diffusion models have unique time - varying activation characteristics. Therefore, this paper proposes a new quantization method - Temporal Dynamic Quantization (TDQ), which dynamically adjusts the quantization interval based on time - step information and significantly improves the output quality. Different from traditional dynamic quantization techniques, this method has no additional computational overhead during the inference process and is compatible with Post - Training Quantization (PTQ) and Quantization - Aware Training (QAT). Experimental results show that this method significantly improves the output quality of quantized diffusion models on multiple datasets.