CAQ: Context-Aware Quantization via Reinforcement Learning

Zhijun Tu,Jian Ma,Tian Xia,Wenzhe Zhao,Pengju Ren,Nanning Zheng
DOI: https://doi.org/10.1109/IJCNN52387.2021.9534248
2021-01-01
Abstract:Model quantization is a crucial step for porting Deep Neural Networks (DNNs) on embedded devices to meet the limited computation and storage resources requirement. Traditional methods usually obtain the scaling factor and quantize the weights based on the information of single layer. However, our analysis indicate that these selection methods of scaling factor overlook the differences and dependencies among layers, leading to large truncation errors or zeroing errors, which is the main reason for the performance degradation. To this end, we propose a Context-Aware Quantization (CAQ) scheme, which formalizes the model quantization as a global optimization problem and leverages reinforcement learning to search for the optimal scaling factors based on the entire model. Further, we adopt shift-based scaling factors to narrow the search space to improve the search efficiency, additionally, it reduces the computational complexity during the inference phase, and also provides a simpler and more robust activation calibration solution. We extensively test our scheme on a wide range of Neural Networks, including ResNet 50/101/152, InceptionV3 and MobileNetV2 on ImageNet, the entire search process only takes about 1 hour on a single GeForce RTX 2080 Ti. Compared with the existed methods, Our scheme can get a better performance, which could maintain the post-quantization accuracy loss less than 0.25%, while reducing memory footprint by 5%-8% and multiply accumulate (MAC) operations by 2%-4%. Besides, we further show that the CAQ can be applied on other tasks, such as object detection and segmentation.
What problem does this paper attempt to address?