LSMQ: A Layer-Wise Sensitivity-Based MixedPrecision Quantization Method for Bit-Flexible CNN Accelerator

Yimin Huang,Kai Chen,Zhuang Shao,Yichuan Bai,Yafeng Huang,Yuan Du,Li Du,Zhongfeng Wang
DOI: https://doi.org/10.1109/ISOCC53507.2021.9613969
2021-01-01
Abstract:Model quantization is a prevailing way to accelerate convolutional neural network (CNN). Quantization with mixed precision tends to compress the model better and further improves the computation efficiency. However, it is challenging to identify the optimal bit width for each layer. In this paper, we proposed a mixed-precision quantization(LSMQ) method based on layer-wise sensitivity. We calculated the sensitivity of each layer first, then the weight of each layer would be automatically quantized with unique precision determined by the sensitivity ranking and a valid search strategy without retraining. Moreover, we presented a bit-flexible CNN accelerator that can efficiently support data operations with varying bit widths after mixedprecision quantization. Experiment on LSMQ shows that the top1 accuracy for VGG16 based on the LSMQ method is 7.31% higher, while the model size is 3.4% smaller compared with previous work.
What problem does this paper attempt to address?