A Method for Accelerating YOLO by Hybrid Computing Based on ARM and FPGA

Qilin Xiong,Chun Liao,Zhenhong Yang,Wanlin Gao
DOI: https://doi.org/10.1145/3508546.3508576
2021-01-01
Abstract:CNN has promoted the rapid development of target recognition and detection technology. By comparison with machine learning, it has faster detection speed and higher robustness. However, the deployment of the CNN network model often needs more computing resources, which hinders the application of artificial intelligence technology. In this paper, the authors use the hybrid architecture of ARM and FPGA to deploy a You Only Look Once (YOLO) model on the FPGA to improve the efficiency of target recognition and detection under condition of low resources consumption and low power consumption. YOLO is a one-stage real-time detection model and it has high detection speed and remarkable accuracy. High-level Synthesis (HLS) is a fast development and verification technology of FPGA based on C/C++. We use HLS to implement the pipeline mechanism and complete the parallel calculation of convolution, thereby constructing a forward reasoning model of YOLOv3-tiny. In order to accelerate the forward inference process of YOLO, we combine convolution with batch normalization. The FPGA we use in the paper is Xilinx Zynq-7035 containing system on chip (SoC). We build the software and hardware co-architecture of ARM and FPGA on Zynq-7035, which makes full use of the logic control advantages of ARM and the logic computing advantages of FPGA. In the end, we achieve 28.99 GOP/S speed with only 3.715W power consumption. Finally, compared with the Ryzen 5 3600, we achieve 41.3inference speed at a lower clock rate.
What problem does this paper attempt to address?