Mixed-precision Deep Neural Network Quantization With Multiple Compression Rates

Xuanda Wang,Wen Fei,Wenrui Dai,Chenglin Li,Junni Zou,Hongkai Xiong
DOI: https://doi.org/10.1109/DCC55655.2023.00075
2023-01-01
Abstract:Quantizing one single deep neural network into multiple compression rates (precisions) has been recently considered for flexible deployments in real-world scenarios. In this paper, we propose a novel scheme that achieves progressive bit-width allocation and joint training to simultaneously optimize mixed-precision quantized networks under multiple compression rates. Specifically, we develop a progressive bit-width allocation with switchable quantization step size to enable mixed-precision quantization based on analytic sensitivity of network layers under multiple compression rates. Furthermore, we achieve joint training for quantized networks under different compression rates via knowledge distillation to exploit their correlations based on the shared network structure. Experimental results show that the proposed scheme outperforms AdaBits [1] in various networks on CIFAR-10 and ImageNet.
What problem does this paper attempt to address?