LPAC - A Low-Precision Accelerator for CNN on FPGAs.

Tianyu Zhang,Tiantian Han,Lu Tian,Yi Li,Xijie Jia,Guangdong Liu,Pingbo An,Yingran Tan,Lingzhi Sui,Shaoxia Fang,Dongliang Xie,Michaela Blott,Yi Shan
DOI: https://doi.org/10.1145/3373087.3375343
2020-01-01
Abstract:Low bit quantization of neural network is required on edge devices to achieve lower power consumption and higher performance. 8bit or binary network either consumes a lot of resources or has accuracy degradation. Thus, a full-process hardware-friendly quantization solution of 4A4W (activations 4bit and weights 4bit) is proposed to achieve better accuracy/resource trade-off. It doesn't contain any additional floating operations and achieve accuracy comparable to full-precision. We also implement a low-precision accelerator for CNN (LPAC) on the Xilinx FPGA, which takes full advantage of its DSP by efficiently mapping convolutional computations. Through on-chip reassign management and resource-saving analysis, high performance can be achieved on small chips. Our 4A4W solution achieves 1.8x higher performance than 8A8W and 2.42x increase in power efficiency under the same resource. On ImageNet classification, the accuracy has a gap less than 1% to full-precision in Top-5. On the human pose estimation, we achieve 261 frames per second on ZU2EG, which is 1.78x speed up compared to 8A8W and the accuracy has only 1.62% gap to full-precision. This proves that our solution has better universality.
What problem does this paper attempt to address?