PTQ-SO: A Scale Optimization-based Approach for Post-training Quantization of Edge Computing

Kangkang Liu,Ningjiang Chen
DOI: https://doi.org/10.1109/cscwd61410.2024.10580660
2024-01-01
Abstract:With the increasing performance of deep convolutional neural networks, they have been widely used in many computer vision tasks. However, a huge convolutional neural network model requires a lot of memory and computing resources, which makes it difficult to meet the requirements of low latency and reliability of edge computing when the model is deployed locally on resource-limited devices in edge environments. Quantization is a kind of model compression technology, which can effectively reduce model size, calculation cost and inference delay, but the quantization noise will cause the accuracy of the quantization model to decrease. Aiming at the problem of precision loss caused by model quantization, this paper proposes a post-training quantization method based on scale optimization. By reducing the influence of redundant parameters in the model on the quantization parameters in the process of model quantization, the scale factor optimization is realized to reduce the quantization error and thus improve the accuracy of the quantized model, reduce the inference delay and improve the reliability of edge applications. The experimental results show that under different quantization strategies and different quantization bit widths, the proposed method can improve the accuracy of the quantized model, and the absolute accuracy of the optimal quantization model is improved by 1.36%. The improvement effect is obvious, which is conducive to the application of deep neural network in edge environment.
What problem does this paper attempt to address?