Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing

Siao Tang,Xin Wang,Hong Chen,Chaoyu Guan,Zewen Wu,Yansong Tang,Wenwu Zhu
2024-07-08
Abstract:High computational overhead is a troublesome problem for diffusion models. Recent studies have leveraged post-training quantization (PTQ) to compress diffusion models. However, most of them only focus on unconditional models, leaving the quantization of widely-used pretrained text-to-image models, e.g., Stable Diffusion, largely unexplored. In this paper, we propose a novel post-training quantization method PCR (Progressive Calibration and Relaxing) for text-to-image diffusion models, which consists of a progressive calibration strategy that considers the accumulated quantization error across timesteps, and an activation relaxing strategy that improves the performance with negligible cost. Additionally, we demonstrate the previous metrics for text-to-image diffusion model quantization are not accurate due to the distribution gap. To tackle the problem, we propose a novel QDiffBench benchmark, which utilizes data in the same domain for more accurate evaluation. Besides, QDiffBench also considers the generalization performance of the quantized model outside the calibration dataset. Extensive experiments on Stable Diffusion and Stable Diffusion XL demonstrate the superiority of our method and benchmark. Moreover, we are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the high computational cost encountered during the quantization process of text-to-image diffusion models. Specifically: 1. **High Computational Cost**: Diffusion models require multiple denoising steps to generate images, leading to high time and memory consumption. This is particularly significant for large-scale pre-trained models such as Stable Diffusion and Stable Diffusion XL. 2. **Limitations of Quantization Methods**: Existing quantization methods mainly focus on unconditional diffusion models, while the quantization research on widely used pre-trained text-to-image models (such as Stable Diffusion) is relatively scarce. Additionally, existing quantization methods overlook accumulated quantization errors and the sensitivity of different denoising steps to image fidelity or text-image matching. 3. **Inaccuracy of Evaluation Metrics**: Current evaluation metrics (such as FID) cannot accurately assess the performance of quantized models due to the distribution gap problem. To address these issues, the authors propose a new post-training quantization method called PCR (Progressive Calibration and Relaxation) and a comprehensive benchmark QDiffBench for evaluating the quantization effects of text-to-image diffusion models. The specific contributions are as follows: 1. **Proposing the PCR Method**: Including a progressive calibration strategy and an activation relaxation strategy, which can effectively reduce accumulated quantization errors and improve performance with almost no additional computational cost. 2. **Proposing the QDiffBench Benchmark**: Including accurate FID calculation strategies and generalization ability evaluation strategies, which can more accurately assess the performance of quantized models. 3. **Extensive Experimental Validation**: A large number of experiments on foundational diffusion models such as Stable Diffusion and Stable Diffusion XL demonstrate the superiority of the proposed methods and benchmarks. 4. **First Quantization of Stable Diffusion XL**: This is one of the largest diffusion models to date, with 350 million parameters. Through these contributions, the paper provides new solutions and evaluation standards for the efficient quantization of text-to-image diffusion models.