An Efficient CNN Accelerator Using Inter-Frame Data Reuse of Videos on FPGAs

Shengzhao Li,Qin Wang,Jianfei Jiang,Weiguang Sheng,Naifeng Jing,Zhigang Mao
DOI: https://doi.org/10.1109/tvlsi.2022.3151788
2022-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Convolutional neural networks (CNNs) have had great success when applied to computer vision technology, and many application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) CNN accelerators have been proposed. These accelerators primarily focus on the acceleration of a single input, and they are not particularly optimized for video applications. In this article, we focus on the similarities between continuous inputs in video, and we propose a YOLOv3-tiny CNN FPGA accelerator using incremental operation. The accelerator can skip the convolution operation of similar data between continuous inputs. We also use the Winograd algorithm to optimize the conv $3\times 3$ operator in the YOLOv3-tiny network to further improve the accelerator’s efficiency. Experimental results show that our accelerator achieved 74.2 frames/s on ImageNet ILSVRC2015. Compared to the original network without Winograd algorithm and incremental operation, our design provides a $4.10\times $ speedup. When compared with other YOLO network FPGA accelerators applied to video applications, our design provided a $3.13\times $ – $18.34\times $ normalized digital signal processor (DSP) efficiency and $1.10\times $ – $14.2\times $ energy efficiency.
What problem does this paper attempt to address?