Abstract:Blockwise reconstruction with adaptive rounding helps achieve acceptable 4-bit post-training quantization accuracy. However, adaptive rounding is time intensive, and the optimization space of weight elements is constrained to a binary set, thus limiting the performance of quantized models. The optimality of block-wise reconstruction requires that subsequent network blocks remain unquantized. To address this, we propose a two-stage post-training quantization scheme, AE-Qdrop, encompassing block-wise reconstruction and global fine-tuning. In the block-wise reconstruction stage, a progressive optimization strategy is introduced as a replacement for adaptive rounding, enhancing both quantization accuracy and efficiency. Additionally, the integration of randomly weighted quantized activation helps mitigate the risk of overfitting. In the global fine-tuning stage, the weights of each quantized network block are corrected simultaneously through logit matching and feature matching. Experiments in image classification and object detection tasks validate that AE-Qdrop achieves high precision and efficient quantization. For the 2-bit MobileNetV2, AE-Qdrop outperforms Qdrop in quantization accuracy by 6.26%, and its quantization efficiency is fivefold higher.

What problem does this paper attempt to address?

This paper aims to solve the problems of precision loss and inefficiency encountered in the low - bit quantization process of convolutional neural networks (CNNs). Specifically, the paper points out that current quantization techniques such as Adaptive Rounding can improve quantization precision, but their optimization space is limited by the binary set, and a large number of iteration cycles are required to gradually weaken the regularization constraints, which significantly reduces the quantization efficiency. In addition, the Block - wise Reconstruction technique assumes that subsequent network blocks are not quantized, but in practical applications all network blocks need to be quantized, which may lead to sub - optimal quantization results for each network block. To overcome these problems, the paper proposes a two - stage post - training quantization scheme - AE - Qdrop, which includes two stages: block reconstruction and global fine - tuning. In the block reconstruction stage, the Progressive Optimization Strategy (POS) is introduced to replace Adaptive Rounding, which improves the quantization precision and efficiency, and the Randomly Weighted Quantized Activation (RWQA) is used to increase the diversity of activations, effectively improving the generalization performance of the quantization model. In the global fine - tuning stage, the weights of each quantized network block are corrected simultaneously through feature matching and logical matching, further improving the quantization precision. The main contributions of the paper are as follows: 1. A theoretical analysis of the limitations of Adaptive Rounding and Block - wise Reconstruction is carried out. 2. The AE - Qdrop algorithm is proposed, which combines the Progressive Optimization Strategy and Randomly Weighted Quantized Activation to improve the precision and efficiency of block reconstruction. Subsequently, the overall quantization precision is further optimized by global fine - tuning of the weights. 3. The quantization precision and efficiency of AE - Qdrop in mainstream networks are verified through extensive experiments, demonstrating its superior performance. For example, for MobileNetV2 with 2 - bit quantization, the quantization precision of AE - Qdrop is 6.26% higher than that of Qdrop, and the quantization efficiency is 5 times that of Qdrop. These improvements make AE - Qdrop an efficient and high - precision low - bit quantization method, which is suitable for resource - constrained mobile devices.

AE-Qdrop: Towards Accurate and Efficient Low-Bit Post-Training Quantization for A Convolutional Neural Network

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks

Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Attention Round for post-training quantization

Optimization-based Post-training Quantization with Bit-split and Stitching

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Post-Training Non-Uniform Quantization for Convolutional Neural Networks

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Weight Equalizing Shift Scaler-Coupled Post-training Quantization

Block-Wise Dynamic-Precision Neural Network Training Acceleration via Online Quantization Sensitivity Analytics

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation

Efficient Adaptive Activation Rounding for Post-Training Quantization

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

decoupleQ: Towards 2-bit Post-Training Uniform Quantization via decoupling Parameters into Integer and Floating Points