Efficient Deep Convolutional Neural Networks Accelerator Without Multiplication and Retraining

Weihong Xu,Zaichen Zhang,Xiaohu You,Chuan Zhang
DOI: https://doi.org/10.1109/icassp.2018.8461627
2018-01-01
Abstract:Recently, low-precision weight method has been considered as a promising scheme to efficiently implement inference of deep convolutional neural networks (DCNN). But it suffers from expensive retraining cost and accuracy degradation. In this paper, a low-bit and retraining-free quantization method, which enables DCNNs to deal inference with only shift and add operations, is proposed. The efficiency is demonstrated in terms of power consumption and chip area. Huffman coding is adopted for further compression. Then by exploring two-level systolic, an efficient hardware accelerator is introduced with respect to the given quantization strategy. Experiment results show that our method achieves higher accuracy than other low-precision networks without retraining process on ImageNet. 5× to 8× compression is obtained on popular models compared to full-precision counterparts. Furthermore, hardware implementation indicates good reduction of slices whereas maintaining throughput.
What problem does this paper attempt to address?