Data Generation for Hardware-Friendly Post-Training Quantization

Lior Dikstein,Ariel Lapid,Arnon Netzer,Hai Victor Habi
2024-10-29
Abstract:Zero-shot quantization (ZSQ) using synthetic data is a key approach for post-training quantization (PTQ) under privacy and security constraints. However, existing data generation methods often struggle to effectively generate data suitable for hardware-friendly quantization, where all model layers are quantized. We analyze existing data generation methods based on batch normalization (BN) matching and identify several gaps between synthetic and real data: 1) Current generation algorithms do not optimize the entire synthetic dataset simultaneously; 2) Data augmentations applied during training are often overlooked; and 3) A distribution shift occurs in the final model layers due to the absence of BN in those layers. These gaps negatively impact ZSQ performance, particularly in hardware-friendly quantization scenarios. In this work, we propose Data Generation for Hardware-friendly quantization (DGH), a novel method that addresses these gaps. DGH jointly optimizes all generated images, regardless of the image set size or GPU memory constraints. To address data augmentation mismatches, DGH includes a preprocessing stage that mimics the augmentation process and enhances image quality by incorporating natural image priors. Finally, we propose a new distribution-stretching loss that aligns the support of the feature map distribution between real and synthetic data. This loss is applied to the model's output and can be adapted to various tasks. DGH demonstrates significant improvements in quantization performance across multiple tasks, achieving up to a 30% increase in accuracy for hardware-friendly ZSQ in both classification and object detection, often performing on par with real data.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies of existing data generation methods in hardware - friendly post - training quantization (PTQ). Specifically, although existing zero - shot quantization (ZSQ) methods can perform quantization using synthetic data under privacy and security constraints, they face challenges in generating data for hardware - friendly full - model quantization. The authors analyzed the existing data generation methods based on batch normalization (BN) matching and pointed out the following three key issues: 1. **Inconsistent statistical aggregation range**: The BN layer aggregates statistical information across the entire dataset during training, while existing data generation techniques usually independently optimize each batch or each image to adapt to BN statistical information, ignoring the global statistical characteristics and diversity of the entire dataset. 2. **Ignoring the impact of data augmentation**: Data augmentation applied during the training process affects BN statistical information, but these effects are often overlooked in synthetic data generation. 3. **Output distribution mismatch**: Since the BN layer is usually absent in the last few layers of the model, there are differences between the feature map distributions of the generated data and the real data in these layers, which is particularly unfavorable for hardware - friendly quantization schemes. To solve these problems, the authors proposed a new data generation method - **DGH (Data Generation for Hardware - friendly quantization)**, which improves data generation in the following ways: - **Global statistical aggregation**: DGH optimizes all generated images instead of optimizing them batch - by - batch or image - by - image. This ensures that the generated images can more accurately reflect the statistical characteristics of the entire training dataset. - **Data augmentation pre - processing**: DGH introduces a pre - processing stage that simulates data augmentation during training and improves image quality by combining natural image priors. - **Output distribution stretching loss (ODSL)**: DGH proposes a new loss function to align the feature distributions of real data and synthetic data at the model output layer, ensuring that the generated data can provide better quantization performance across the entire model. Experimental results show that DGH significantly improves quantization performance on multiple tasks, including classification and object detection tasks, achieving results comparable to or even better than those of real data. In particular, in hardware - friendly quantization scenarios, DGH achieves an accuracy improvement of up to 30%. In summary, this paper aims to solve the limitations of existing ZSQ methods in hardware - friendly quantization by improving the data generation method, thereby achieving more efficient model deployment and higher quantization accuracy.