Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss

Jung Hyun Lee,Jihun Yun,Sung Ju Hwang,Eunho Yang
DOI: https://doi.org/10.48550/arXiv.2109.02100
2021-09-05
Abstract:Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision network, they still incur large quantization errors after training, thus giving rise to a significant performance gap between a full-precision network and its quantized counterpart. In this work, we propose a novel quantization method for neural networks, Cluster-Promoting Quantization (CPQ) that finds the optimal quantization grids while naturally encouraging the underlying full-precision weights to gather around those quantization grids cohesively during training. This property of CPQ is thanks to our two main ingredients that enable differentiable quantization: i) the use of the categorical distribution designed by a specific probabilistic parametrization in the forward pass and ii) our proposed multi-class straight-through estimator (STE) in the backward pass. Since our second component, multi-class STE, is intrinsically biased, we additionally propose a new bit-drop technique, DropBits, that revises the standard dropout regularization to randomly drop bits instead of neurons. As a natural extension of DropBits, we further introduce the way of learning heterogeneous quantization levels to find proper bit-length for each layer by imposing an additional regularization on DropBits. We experimentally validate our method on various benchmark datasets and network architectures, and also support a new hypothesis for quantization: learning heterogeneous quantization levels outperforms the case using the same but fixed quantization levels from scratch.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reduce quantization loss during the neural network quantization process while maintaining performance at low bit - widths. Specifically, the paper proposes a new quantization method named Cluster - Promoting Quantization (CPQ), which aims to find the optimal quantization grid and naturally encourages full - precision weights to cluster around these quantization grids during the training process, thereby reducing quantization errors. In addition, to further improve performance, the paper also proposes a new bit - dropping technique called DropBits, which reduces the bias of the multi - class Straight - Through Estimator (multi - class STE) by randomly dropping bits instead of neurons. Finally, the paper explores methods for learning heterogeneous quantization levels to adapt to the optimal bit - widths of different layers, thereby achieving more efficient resource utilization. ### Main contributions of the paper: 1. **Cluster - Promoting Quantization (CPQ)**: - A new quantization method is proposed, which can not only find the optimal quantization grid but also promote full - precision weights to cluster around these grids at low bit - widths. - Through the combination of specific probability parameterization and multi - class Straight - Through Estimator (multi - class STE), better clustering effects and final performance are achieved. 2. **DropBits**: - A new bit - dropping technique is proposed, which reduces the bias of the multi - class Straight - Through Estimator by randomly dropping bits. - Inspired by Dropout, but applied to bit - dropping in the quantization process instead of neuron - dropping. 3. **Heterogeneous quantization**: - An additional regularization method is introduced, which allows learning different bit - widths for each layer or channel, thereby achieving more efficient resource utilization. - It is verified that learning heterogeneous quantization levels results in better network performance than training a network from scratch with the same fixed quantization level. ### Experimental results: - The paper has carried out extensive experiments on multiple benchmark datasets, including MNIST, CIFAR - 10 and ImageNet, verifying the effectiveness of CPQ and DropBits. - On ResNet - 18 and MobileNetV2, CPQ + DropBits achieved new state - of - the - art results when all layers were uniformly quantized. - The heterogeneous quantization method still achieved satisfactory results when using at most 4 bits for all layers, verifying the new quantization hypothesis. ### Formula presentation: - **Quantization probability calculation**: \[ \pi_i=\text{Sigmoid}\left(\frac{g_i+\frac{\alpha}{2}-x}{\sigma}\right)-\text{Sigmoid}\left(\frac{g_i-\frac{\alpha}{2}-x}{\sigma}\right) \] where \( g_i \) is the quantization grid, \( \alpha \) controls the grid spacing size, and \( \sigma \) is the standard deviation. - **Multi - class Straight - Through Estimator**: - Forward propagation: \[ y = \text{onehot}(\arg\max_i\pi_i) \] - Backward propagation: \[ \frac{\partial L}{\partial\pi_{i_{\text{max}}}}=\frac{\partial L}{\partial y_{i_{\text{max}}}}, \quad \frac{\partial L}{\partial\pi_i} = 0\quad\forall i\neq i_{\text{max}} \] - **DropBits' binary mask generation**: \[ U_k\sim\text{Uniform}(0, 1) \] \[ S_k=\text{Sigmoid}\left(\frac{\lo