WGeod: A General and Efficient FPGA Accelerator for Object Detection
Zihan Wang,Mengying Zhao,Lei Gong,Chao Wang
DOI: https://doi.org/10.1109/ispa-bdcloud-socialcom-sustaincom57177.2022.00099
2022-01-01
Abstract:With the development of high-performance embedded applications, object detection algorithms are starting to be deployed to embedded devices with fewer resources, and the classification and location information they provide is the basis for making further decisions in some high-performance embedded applications, so real-time and low-power consumption are especially needed. In this paper, we propose an FPGA-based neural network ac-celerator for object detection, which improves the speed, throughput and power efficiency of the inference with little loss of accuracy. The accelerator balances the speed difference between computation and memory access. The accelerator is scalable and can be configured for CNN of any size, and is general enough to support the acceleration of YOLOv2, YOLOv3 and their Tiny versions. We deploy the Tiny YOLOv3 network on FPGA-based accelerator, CPU and GPU platforms, and compare them in terms of speed, throughput and power efficiency. For the Tiny YOLOv3 network, the accelerator delivers an FPS of 1.61 and throughput of 8.95 GOP/s, which is 23.81x, 4.11x and 3.00x better than the Intel Core i5-10210U CPU with the 00, 01 and O3 compilation optimization options respectively. With 2.11 W of power, the accelerator delivers a power efficiency of 4.24 GOP/s/W, which is 141.33x, 28.27x and 21.20x better than the Intel Core i5-10210U CPU with the 00, O1 and O3 options respectively.