A 4-Bit Integer-Only Neural Network Quantization Method Based on Shift Batch Normalization

Qingyu Guo,Xiaoxin Cui,Jian Zhang,Aifei Zhang,Xinjie Guo,Yuan Wang
DOI: https://doi.org/10.1109/iscas48785.2022.9938013
2022-01-01
Abstract:Neural networks are powerful, but at the cost of huge amounts of computation. Deploying neural networks on edge devices is especially challenging. Quantization is a possible solution to alleviate the huge cost, while most quantization methods are not sufficiently hardware-friendly. In this paper, we proposed an integer-only quantization method. With no division or big integer multiplication, this quantization method is suitable to be deployed on co-designed hardware platforms. We applied 4-bit quantization on some classical networks and corresponding datasets. On MNIST, CIFAR10 and CFAR100, quantization networks perform as well as original networks. On SpeechCommands, accuracy error induced by quantization is 0.16%. We also deployed quantized networks under OpenCL framework and on a flash-based in-memory-computing chip to verify this method’s feasibility.
What problem does this paper attempt to address?