DQI: A Dynamic Quantization Method for Efficient Convolutional Neural Network Inference Accelerators

Qiang Liu,Yun Wang,Shun Yan
DOI: https://doi.org/10.1109/FCCM53951.2022.9786195
2022-05-15
Abstract:The post-training compression with quantization is a common technology to improve the efficiency of embedded neural network accelerators. In this paper, a Dynamic Quantization in Inference (DQI) method is proposed to solve the severe quantization overflow problem that may occur in CNN inference process. Based on analysis of quantization errors of activation values in convolutional layers, efficient quantization overflow detection and quantization parameters dynamic update are designed and implemented in CNN accelerator. The evaluation result on VGG16 and MobileNetV2 models demonstrates that DQI can improve the inference accuracy of by up to 11.59% in high overflow scenarios, while the overhead in hardware resources and runtime is acceptable.
Computer Science,Engineering
What problem does this paper attempt to address?