Abstract:Neural network quantization aims to reduce the model size, computational complexity, and memory consumption by mapping weights and activations from full-precision to low-precision. However, many existing quantization methods, either post-training with calibration or quantization-aware training with fine-tuning, require original data for better performance, which may not be available due to confidentiality or privacy constraints. This lack of data can lead to a significant decline in performance. In this paper, we propose a universal and effective method called Generative Data Free Model Quantization with Knowledge Matching for Classification (KMDFQ) that removes the dependence on data for neural network quantization. To achieve this, we propose a knowledge matching generator that produces meaningful fake data based on the latent knowledge in the pre-trained model, including classification boundary knowledge and data distribution information. Based on this generator, we propose a fake-data driven data free quantization method that uses the generated data to take advantage of the latent knowledge for quantization. Furthermore, we introduce Mean Square Error alignment during the fine-tuning of the quantized model to more strictly and directly learn knowledge, making it more suitable for data free quantization. Extensive experiments on image classification demonstrate the effectiveness of our method, achieving higher accuracy than existing data free quantization methods, particularly as the quantization bit decreases. For example, on ImageNet, the 4-bit data free quantized ResNet-18 has less than a 1.2% accuracy decline compared to quantization with real data. The source code is available at https://github.com/ZSHsh98/KMDFQ.

Data-Free Low-Bit Quantization Via Dynamic Multi-teacher Knowledge Distillation.

Generative Low-Bitwidth Data Free Quantization

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data.

Generative Data Free Model Quantization with Knowledge Matching for Classification

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Explore a Novel Knowledge Distillation Framework for Network Learning and Low-Bit Quantization

Feature Affinity Assisted Knowledge Distillation and Quantization of Deep Neural Networks on Label-Free Data

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

Self-Supervised Quantization-Aware Knowledge Distillation

SQuant: On-the-Fly Data-Free Quantization Via Diagonal Hessian Approximation

Deep Transferring Quantization

Quantized Feature Distillation for Network Quantization

Instance-Aware Dynamic Neural Network Quantization

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression.

DQDG: Data-Free Quantization With Dual Generators for Keyword Spotting

Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing

Q-DM: an Efficient Low-bit Quantized Diffusion Model

Bit Efficient Quantization for Deep Neural Networks