Dual-Line-Systolic Array for High Performance CNN Accelerator

Peng Xue,Lunshuai Pan,Litao Sun,Mingqiang Huang
DOI: https://doi.org/10.1109/fccm53951.2022.9786215
2022-01-01
Abstract:Systolic array has been the crucial architecture for accelerating convolutional neural networks (CNN) since the success of Google's TPU (Tensor Processing Unit). In this work, we propose high throughput and low delay dual-line-systolic array for accelerating the convolutional neural networks. With the lineby-line vector-style systolic dataflow, the peripheral circuit can be well simplified and the loading/offloading delay can be greatly reduced. Besides, to fully take advantage of the DSP (Digital signal processor) INT8 computation in FPGA, dual-line-systolic array is developed, by which the computation throughput can be doubled. Finally, the proposed accelerator is deployed on PYNQ-Z2 for practically accelerating VGG16 neural network, peek throughput of the convolution layer can reach as high as 107.21 GOPS, which has exceeded all of the previous works on the same hardware platform.
What problem does this paper attempt to address?