GDRQ: Group-based Distribution Reshaping for Quantization

Haibao Yu,Tuopu Wen,Guangliang Cheng,Jiankai Sun,Qi Han,Jianping Shi
DOI: https://doi.org/10.48550/arXiv.1908.01477
2019-08-05
Abstract:Low-bit quantization is challenging to maintain high performance with limited model capacity (e.g., 4-bit for both weights and activations). Naturally, the distribution of both weights and activations in deep neural network are Gaussian-like. Nevertheless, due to the limited bitwidth of low-bit model, uniform-like distributed weights and activations have been proved to be more friendly to quantization while preserving accuracy~\cite{Han2015Learning}. Motivated by this, we propose Scale-Clip, a Distribution Reshaping technique that can reshape weights or activations into a uniform-like distribution in a dynamic manner. Furthermore, to increase the model capability for a low-bit model, a novel Group-based Quantization algorithm is proposed to split the filters into several groups. Different groups can learn different quantization parameters, which can be elegantly merged in to batch normalization layer without extra computational cost in the inference stage. Finally, we integrate Scale-Clip technique with Group-based Quantization algorithm and propose the Group-based Distribution Reshaping Quantization (GDQR) framework to further improve the quantization performance. Experiments on various networks (e.g. VGGNet and ResNet) and vision tasks (e.g. classification, detection and segmentation) demonstrate that our framework achieves good performance.
Computer Vision and Pattern Recognition,Machine Learning,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to maintain the high performance of deep neural network models under low - bit quantization (such as 4 - bit weights and activations). Specifically, the author points out that in low - bit models, due to the limited bit width, uniformly distributed weights and activations are more conducive to quantization than Gaussian distributions while maintaining accuracy. However, in the natural state, the weights and activations in deep neural networks mostly present Gaussian or Laplacian distributions, which leads to a large quantization loss. Therefore, the author proposes two main strategies to solve this problem: 1. **Distribution Reshaping (DR)**: By proposing a method called Scale - Clip, the distribution of weights and activations is dynamically reshaped to be close to a uniform distribution, thereby reducing the quantization loss and improving the performance of low - bit models. 2. **Group - based Quantization (GQ)**: The convolutional filters are divided into multiple groups, and each group can learn different quantization parameters. In this way, the expressive ability of low - bit models can be enhanced without increasing additional computational costs. Combining these two methods, the author proposes the **Group - based Distribution Reshaping Quantization (GDRQ)** framework, aiming to further improve the performance of low - bit quantization. The experimental results show that this framework has achieved better performance than existing methods on multiple networks (such as VGGNet and ResNet) and visual tasks (such as classification, detection, and segmentation). In particular, the accuracy of the ResNet - 50 model with 2 - bit weights and 4 - bit activations on the ImageNet classification task has dropped by less than 1%, which was the best - known result at that time.