PRArch: Pattern-Based Reconfigurable Architecture for Deep Neural Network Acceleration

Zhaoming Jiang,Zhuoran Song,Xiaoyao Liang,Naifeng Jing
DOI: https://doi.org/10.1109/hpcc-smartcity-dss50907.2020.00016
2020-01-01
Abstract:Quantization is now widely used for Deep Neural Network (DNN) inference acceleration. While mixed-precision quantization achieve better compression rate as well as better accuracy compared to fixed-precision quantization, it is non-trivial and costly to make hardware accelerator like systolic array to support mixed-precision. In this paper, we propose a Pattern-based Mixed-precision Quantization algorithm, namely PMQ, to transform mixed-precision kernels into fix-precision which is hardware friendly, and we further observe the pattern-based sparsity existing in the high parts of transformed kernels, leading to a novel aggregated sparse kernel convolution. Based on the PMQ, we propose an accelerator PRArch supporting mixed-precision convolution neural networks using fix-precision systolic array with minimal overhead. Experiments on several typical convolution networks show a speedup of 1.86x in average compared to coarse-grained quantization accelerator Eyeriss using the same computation chip area, and the accuracy drop less than 1% without fine-tuning.
What problem does this paper attempt to address?