Fast Non-Uniform Quantization of Neural Networks

Yuan Gao,Qiyue Wang,Chen Zhao,Yong Yuan
DOI: https://doi.org/10.1109/icccbda55098.2022.9778293
2022-01-01
Abstract:Neural Networks (NNs) have achieved state-of-the-art (SOTA) performance in a number of domains but suffer intensive complexity. Network quantization can effectively reduce computation and memory costs without modifying network structures, facilitating the deployment of NN s on cloud and edge devices. However, the low-bit quantization without time-consuming training or access to the full training set is still a challenging problem. Inspired by the traditional companding technique in the signal processing area, we propose a novel method to achieve fast non-uniform quantization of NNs with a few unlabeled samples. Extensive experiments on ImageNet2012 demonstrate that the proposed method can guarantee efficiency and accuracy simultaneously. We further show that the proposed method can extend to other computer vision tasks such as object detection and semantic segmentation.
What problem does this paper attempt to address?