PTQD: Accurate Post-Training Quantization for Diffusion Models

Yefei He,Luping Liu,Jing Liu,Weijia Wu,Hong Zhou,Bohan Zhuang
2023-11-01
Abstract:Diffusion models have recently dominated image synthesis tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization (PTQ) of diffusion models can significantly reduce the model size and accelerate the sampling process without re-training. Nonetheless, applying existing PTQ methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. As the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) during the later denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. Specifically, we first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we subtract the bias from the quantized results to correct the mean deviation and calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we introduce a mixed-precision scheme for selecting the optimal bitwidth for each denoising step. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations. Code is available at <a class="link-external link-https" href="https://github.com/ziplab/PTQD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper "PTQD: Accurate Post - Training Quantization for Diffusion Models" aims to solve the problems of high computational cost and large model size encountered by diffusion models in low - latency and scalable practical applications. Specifically, the paper focuses on the following two main problems: 1. **High computational cost**: - Diffusion models require a large number of iterative denoising steps during inference, which makes them slow in practical applications, especially in real - time applications. For example, even on a high - performance platform (such as RTX 3090), it still takes more than one second for Stable Diffusion using the DPM - Solver sampler to generate a 512×512 image. 2. **Large model size**: - Diffusion models usually have a large number of parameters and computational complexity, which limits their applications on various devices. For example, running Stable Diffusion requires 16GB of memory and more than 10GB of video memory, which is infeasible for most consumer - level PCs and resource - constrained edge devices. To address these problems, the paper proposes a new post - training quantization framework PTQD, which can significantly reduce the model size and accelerate the sampling process without retraining, while maintaining the quality of the generated samples. Specifically, the paper solves the following technical challenges: - **Bias and additional variance introduced by quantization noise**: - Quantization noise will cause the bias of the estimated mean in each denoising step and the conflict with the preset variance plan. As the iterative sampling process progresses, the quantization noise will accumulate, resulting in a significant decrease in the signal - to - noise ratio (SNR) in the later denoising steps, which seriously affects the quality of the generated images. - **Signal - to - noise ratio problem of low - bit quantization models**: - The signal - to - noise ratio (SNR) of low - bit quantization models will decrease significantly in the later denoising steps, which makes it difficult to generate high - quality samples. The paper introduces a step - size - aware mixed - precision scheme to dynamically select the optimal bit width for each denoising step to maintain a high signal - to - noise ratio. Through these methods, the paper successfully reduces the computational cost and model size while maintaining the quality of the generated samples, reaching a new state - of - the - art performance level.