General Bitwidth Assignment for Efficient Deep Convolutional Neural Network Quantization

Wen Fei,Wenrui Dai,Chenglin Li,Junni Zou,Hongkai Xiong
DOI: https://doi.org/10.1109/tnnls.2021.3069886
IF: 14.255
2021-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Model quantization is essential to deploy deep convolutional neural networks (DCNNs) on resource-constrained devices. In this article, we propose a general bitwidth assignment algorithm based on theoretical analysis for efficient layerwise weight and activation quantization of DCNNs. The proposed algorithm develops a prediction model to explicitly estimate the loss of classification accuracy led by weight quantization with a geometrical approach. Consequently, dynamic programming is adopted to achieve optimal bitwidth assignment on weights based on the estimated error. Furthermore, we optimize bitwidth assignment for activations by considering the signal-to-quantization-noise ratio (SQNR) between weight and activation quantization. The proposed algorithm is general to reveal the tradeoff between classification accuracy and model size for various network architectures. Extensive experiments demonstrate the efficacy of the proposed bitwidth assignment algorithm and the error rate prediction model. Furthermore, the proposed algorithm is shown to be well extended to object detection.
What problem does this paper attempt to address?