Pse: Mixed Quantization Framework of Neural Networks for Efficient Deployment
Yingqing Yang,Guanzhong Tian,Mingyuan Liu,Yihao Chen,Jun Chen,Yong Liu,Yu Pan,Longhua Ma
DOI: https://doi.org/10.1007/s11554-023-01366-9
IF: 2.293
2023-01-01
Journal of Real-Time Image Processing
Abstract:Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8 × acceleration ratio and 30.4 × compression ratio with less than 1.54