Abstract:Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios. Post-training Quantization (PTQ) offers a promising solution by compressing model sizes and speeding up inference for the pretrained models while eliminating model retraining. However, we have observed the existing PTQ frameworks exclusively designed for both ViT and conventional Diffusion models fall into biased quantization and result in remarkable performance degradation. In this paper, we find that the DiTs typically exhibit considerable variance in terms of both weight and activation, which easily runs out of the limited numerical representations. To address this issue, we devise Q-DiT, which seamlessly integrates three techniques: fine-grained quantization to manage substantial variance across input channels of weights and activations, an automatic search strategy to optimize the quantization granularity and mitigate redundancies, and dynamic activation quantization to capture the activation changes across timesteps. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of the proposed Q-DiT. Specifically, when quantizing DiT-XL/2 to W8A8 on ImageNet 256x256, Q-DiT achieves a remarkable reduction in FID by 1.26 compared to the baseline. Under a W4A8 setting, it maintains high fidelity in image generation, showcasing only a marginal increase in FID and setting a new benchmark for efficient, high-quality quantization in diffusion transformers. Code is available at \href{<a class="link-external link-https" href="https://github.com/Juanerx/Q-DiT" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/Juanerx/Q-DiT" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **the significant performance degradation of the Diffusion Transformer (DiT) model after quantization**. Specifically, although diffusion models (especially the transition from the UNet architecture to the Diffusion Transformer) have made remarkable progress in the quality and scalability of image synthesis, the computational requirements of these large - scale models are very high, which limits their deployment in practical scenarios. To compress the model size and accelerate inference, Post - Training Quantization (PTQ) is a promising solution. However, when the existing PTQ frameworks are applied to the Diffusion Transformer, they will lead to significant performance degradation. ### Main problems and challenges 1. **Significant variance in weights and activations**: The Diffusion Transformer exhibits a large variance in weights and activations and is prone to exceed the limited numerical representation range. 2. **Limitations of existing quantization methods**: The existing PTQ methods are mainly designed for ViT and traditional diffusion models and cannot well adapt to the characteristics of the Diffusion Transformer. 3. **Activation distribution changes with time steps**: The activation distribution of the Diffusion Transformer changes with different time steps, making it difficult for the quantization parameters calibrated at a specific time step to be generalized to all time steps. 4. **Redundant group - size configuration**: The default group - size configuration is not always optimal, and reducing the group size does not always lead to better quantization performance. ### Solutions To solve these problems, the authors propose **Q - DiT**, an accurate post - training quantization method specifically for the Diffusion Transformer. Q - DiT combines the following three techniques: 1. **Fine - grained quantization**: By managing the significant variance of weights and activations on the input channels, a fine - grained quantization strategy is adopted. 2. **Automatic search strategy**: To optimize the quantization granularity and reduce redundancy, an evolutionary search algorithm is used to find the optimal group - size configuration. 3. **Dynamic activation quantization**: To capture the activation changes at different time steps, a dynamic activation quantization mechanism is adopted to adapt to the changes in the activation distribution. ### Experimental results The experimental results show that Q - DiT performs very well on the ImageNet dataset. In particular, under the W8A8 configuration, Q - DiT achieves almost lossless compression; while under the W4A8 configuration, Q - DiT also maintains a high image - generation fidelity, and the FID only increases slightly. This indicates that Q - DiT sets a new benchmark in efficient and high - quality quantization. ### Summary The main contributions of the paper include: - Proposing a fine - grained quantization method for the Diffusion Transformer, which can effectively manage the input - channel variance of weights and activations and adopt dynamic activation quantization to adapt to the activation changes at different time steps. - Optimizing the group - size configuration through an evolutionary search strategy, improving the quantization efficiency and effectiveness. - Extensive experiments on the ImageNet dataset verify the effectiveness of Q - DiT and demonstrate its superior performance under low - bit quantization. Through these improvements, Q - DiT not only solves the performance degradation problem of the Diffusion Transformer after quantization but also provides a new benchmark for efficient high - fidelity image generation.

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

PTQ4DiT: Post-training Quantization for Diffusion Transformers

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

TaQ-DiT: Time-aware Quantization for Diffusion Transformers

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

HQ-DiT: Efficient Diffusion Transformer with FP4 Hybrid Quantization

PTQD: Accurate Post-Training Quantization for Diffusion Models

Q-Diffusion: Quantizing Diffusion Models

DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing

TerDiT: Ternary Diffusion Models with Transformers

An Analysis on Quantizing Diffusion Transformers

PackQViT: Faster Sub-8-bit Vision Transformers Via Full and Packed Quantization on the Mobile.

EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion

Towards Accurate Post-training Quantization for Diffusion Models

Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing

QNCD: Quantization Noise Correction for Diffusion Models

Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models.

DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

Temporal Feature Matters: A Framework for Diffusion Model Quantization