Unsupervised Network Quantization via Fixed-Point Factorization
Peisong Wang,Xiangyu He,Qiang Chen,Anda Cheng,Qingshan Liu,Jian Cheng
DOI: https://doi.org/10.1109/tnnls.2020.3007749
IF: 14.255
2021-06-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:The deep neural network (DNN) has achieved remarkable performance in a wide range of applications at the cost of huge memory and computational complexity. Fixed-point network quantization emerges as a popular acceleration and compression method but still suffers from huge performance degradation when extremely low-bit quantization is utilized. Moreover, current fixed-point quantization methods rely heavily on supervised retraining using large amounts of the labeled training data, while the labeled data are hard to obtain in the real-world applications. In this article, we propose an efficient framework, namely, fixed-point factorized network (FFN), to turn all weights into ternary values, i.e., {−1, 0, 1}. We highlight that the proposed FFN framework can achieve negligible degradation even without any supervised retraining on the labeled data. Note that the activations can be easily quantized into an 8-bit format; thus, the resulting networks only have low-bit fixed-point additions that are significantly more efficient than 32-bit floating-point multiply–accumulate operations (MACs). Extensive experiments on large-scale ImageNet classification and object detection on MS COCO show that the proposed FFN can achieve about more than $20times $ compression and remove most of the multiply operations with comparable accuracy. Codes are available on GitHub at https://github.com/wps712/FFN.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture