A High-Performance Systolic Array Accelerator Dedicated for CNN.

Jing Shen,Haoqi Ren,Zhifeng Zhang,Jun Wu,Wenqi Pan,Zhenyu Jiang
DOI: https://doi.org/10.1109/icct46805.2019.8947127
2019-01-01
Abstract:The rapid development of artificial intelligence has made the convolutional neural network (CNN) more important. The traditional computing architecture based on CPU can't meet the requirements of the practical applications. Therefore, the development of a new hardware computing platform for CNN becomes more urgent. This paper proposes a systolic array accelerator dedicated to CNN. As CNN model requires a lot of simple logic operations, we optimize the convolution calculation module with the systolic multiply-accumulate (MAC) array. We design three kinds of convolution calculation mappings, which can deal with convolution with different sizes. Our accelerator realizes efficient reuse of local storage area data, which reduces data movement and improves computing performance. To balance storage bandwidth and computational speed, the convolutional layer is subdivided into granular tasks and executed to mask the time of accessing external storage. This accelerator also supports Winograd convolution of 3x3 weight kernels. The heterogeneous system consisting of the accelerator and the self-developed digital signal processor SWIFT (SWIFT DSP) is verified on the FPGA platform. The experimental results show that our accelerator outperforms traditional accelerators under the same condition.
What problem does this paper attempt to address?