Abstract:Post-Training Quantization (PTQ) has received significant attention because it requires only a small set of calibration data to quantize a full-precision model, which is more practical in real-world applications in which full access to a large training set is not available. However, it often leads to overfitting on the small calibration dataset. Several methods have been proposed to address this issue, yet they still rely on only the calibration set for the quantization and they do not validate the quantized model due to the lack of a validation set. In this work, we propose a novel meta-learning based approach to enhance the performance of post-training quantization. Specifically, to mitigate the overfitting problem, instead of only training the quantized model using the original calibration set without any validation during the learning process as in previous PTQ works, in our approach, we both train and validate the quantized model using two different sets of images. In particular, we propose a meta-learning based approach to jointly optimize a transformation network and a quantized model through bi-level optimization. The transformation network modifies the original calibration data and the modified data will be used as the training set to learn the quantized model with the objective that the quantized model achieves a good performance on the original calibration data. Extensive experiments on the widely used ImageNet dataset with different neural network architectures demonstrate that our approach outperforms the state-of-the-art PTQ methods.

What problem does this paper attempt to address?

The paper primarily addresses the issue of deploying deep neural networks (DNN) on resource-constrained devices, particularly focusing on how to effectively reduce model size while maintaining high prediction accuracy. Specifically, the research focuses on the overfitting problem in the **Post-Training Quantization (PTQ)** process. ### Research Background and Objectives 1. **Quantization Techniques**: To reduce the memory footprint and computational cost of DNN models, researchers have developed network quantization techniques, mainly including Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ). 2. **Challenges of PTQ**: Although PTQ requires only a small amount of calibration data to quantize the model, making it more practical in real-world applications, this method tends to cause the model to overfit on the calibration dataset. 3. **Research Objective of the Paper**: Propose a meta-learning-based method—MetaAug, to enhance the performance of models during the post-training quantization process by reducing overfitting and improving the generalization ability of the quantized model. ### Main Contributions 1. **Proposing the MetaAug Method**: MetaAug adopts an innovative meta-learning framework that jointly optimizes a transformation network and a quantized model. The transformation network is responsible for modifying the original calibration data, and the modified data is used to train the quantized model; the original calibration data is used as a validation set to ensure that the quantized model does not overfit. 2. **Improving the Transformation Network**: To prevent the transformation network from degenerating into a simple identity mapping, the researchers introduced various loss functions, including probabilistic knowledge transfer loss and margin loss, to retain the information of the original calibration data and avoid the transformation network becoming an identity transformation. 3. **Experimental Results**: Extensive experiments on the ImageNet dataset show that the MetaAug method significantly outperforms existing post-training quantization methods across different neural network architectures, especially in low-bit quantization settings. In summary, this paper aims to address the common overfitting problem in the post-training quantization process by introducing meta-learning techniques to enhance the generalization ability of quantized models, achieving significant results in practical applications.

MetaAug: Meta-Data Augmentation for Post-Training Quantization

Hessian-based Mixed-Precision Quantization with Transition Aware Training for Neural Networks

Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment

EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization

Optimization-based Post-training Quantization with Bit-split and Stitching

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

Automatic low-bit hybrid quantization of neural networks through meta learning

Attention Round for post-training quantization

Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization

Attention-aware Post-training Quantization without Backpropagation

Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

PTQ4ViT: Post-training quantization for vision transformers with twin uniform quantization

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

AffineQuant: Affine Transformation Quantization for Large Language Models

Error-aware Quantization through Noise Tempering

RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization

SelectQ: Calibration Data Selection for Post-Training Quantization

Post-training Quantization or Quantization-aware Training? That is the Question

L4Q: Parameter Efficient Quantization-Aware Fine-Tuning on Large Language Models

Optimizing Large Language Models through Quantization: A Comparative Analysis of PTQ and QAT Techniques