Abstract:Zero-shot quantization (ZSQ) using synthetic data is a key approach for post-training quantization (PTQ) under privacy and security constraints. However, existing data generation methods often struggle to effectively generate data suitable for hardware-friendly quantization, where all model layers are quantized. We analyze existing data generation methods based on batch normalization (BN) matching and identify several gaps between synthetic and real data: 1) Current generation algorithms do not optimize the entire synthetic dataset simultaneously; 2) Data augmentations applied during training are often overlooked; and 3) A distribution shift occurs in the final model layers due to the absence of BN in those layers. These gaps negatively impact ZSQ performance, particularly in hardware-friendly quantization scenarios. In this work, we propose Data Generation for Hardware-friendly quantization (DGH), a novel method that addresses these gaps. DGH jointly optimizes all generated images, regardless of the image set size or GPU memory constraints. To address data augmentation mismatches, DGH includes a preprocessing stage that mimics the augmentation process and enhances image quality by incorporating natural image priors. Finally, we propose a new distribution-stretching loss that aligns the support of the feature map distribution between real and synthetic data. This loss is applied to the model's output and can be adapted to various tasks. DGH demonstrates significant improvements in quantization performance across multiple tasks, achieving up to a 30% increase in accuracy for hardware-friendly ZSQ in both classification and object detection, often performing on par with real data.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the deficiencies of existing data generation methods in hardware - friendly post - training quantization (PTQ). Specifically, although existing zero - shot quantization (ZSQ) methods can perform quantization using synthetic data under privacy and security constraints, they face challenges in generating data for hardware - friendly full - model quantization. The authors analyzed the existing data generation methods based on batch normalization (BN) matching and pointed out the following three key issues: 1. **Inconsistent statistical aggregation range**: The BN layer aggregates statistical information across the entire dataset during training, while existing data generation techniques usually independently optimize each batch or each image to adapt to BN statistical information, ignoring the global statistical characteristics and diversity of the entire dataset. 2. **Ignoring the impact of data augmentation**: Data augmentation applied during the training process affects BN statistical information, but these effects are often overlooked in synthetic data generation. 3. **Output distribution mismatch**: Since the BN layer is usually absent in the last few layers of the model, there are differences between the feature map distributions of the generated data and the real data in these layers, which is particularly unfavorable for hardware - friendly quantization schemes. To solve these problems, the authors proposed a new data generation method - **DGH (Data Generation for Hardware - friendly quantization)**, which improves data generation in the following ways: - **Global statistical aggregation**: DGH optimizes all generated images instead of optimizing them batch - by - batch or image - by - image. This ensures that the generated images can more accurately reflect the statistical characteristics of the entire training dataset. - **Data augmentation pre - processing**: DGH introduces a pre - processing stage that simulates data augmentation during training and improves image quality by combining natural image priors. - **Output distribution stretching loss (ODSL)**: DGH proposes a new loss function to align the feature distributions of real data and synthetic data at the model output layer, ensuring that the generated data can provide better quantization performance across the entire model. Experimental results show that DGH significantly improves quantization performance on multiple tasks, including classification and object detection tasks, achieving results comparable to or even better than those of real data. In particular, in hardware - friendly quantization scenarios, DGH achieves an accuracy improvement of up to 30%. In summary, this paper aims to solve the limitations of existing ZSQ methods in hardware - friendly quantization by improving the data generation method, thereby achieving more efficient model deployment and higher quantization accuracy.

Data Generation for Hardware-Friendly Post-Training Quantization

Hessian-based Mixed-Precision Quantization with Transition Aware Training for Neural Networks

Genie: Show Me the Data for Quantization

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

Long-Range Zero-Shot Generative Deep Network Quantization

Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment

Diverse Sample Generation: Pushing the Limit of Generative Data-Free Quantization

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization

Automated Backend-Aware Post-Training Quantization

Zero-shot Adversarial Quantization

Hybrid and non-uniform quantization methods using retro synthesis data for efficient inference

Fine-grained Data Distribution Alignment for Post-Training Quantization

ZeroQ: A Novel Zero Shot Quantization Framework

Deep quantization generative networks

2DQuant: Low-bit Post-Training Quantization for Image Super-Resolution

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

DQDG: Data-Free Quantization With Dual Generators for Keyword Spotting

Generative Zero-shot Network Quantization

SQuant: On-the-Fly Data-Free Quantization Via Diagonal Hessian Approximation

TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration

QGen: On the Ability to Generalize in Quantization Aware Training