Algorithm-Hardware Co-Optimization for Energy-Efficient Drone Detection on Resource-Constrained FPGA

Han-sok Suh,Jian Meng,Ty Nguyen,Vijay Kumar,Yu Cao,Jae-sun Seo
DOI: https://doi.org/10.1145/3583074
IF: 2.837
2023-02-16
ACM Transactions on Reconfigurable Technology and Systems
Abstract:Convolutional neural network (CNN) based object detection has achieved very high accuracy, e.g. single-shot multi-box detectors (SSD) can efficiently detect and localize various objects in an input image. However, they require a high amount of computation and memory storage, which makes it difficult to perform efficient inference on resource-constrained hardware devices such as drones or unmanned aerial vehicles (UAVs). Drone/UAV detection is an important task for applications including surveillance, defense, and multi-drone self-localization and formation control. In this paper, we designed and co-optimized algorithm and hardware for energy-efficient drone detection on resource-constrained FPGA devices. We trained SSD object detection algorithm with a custom drone dataset. For inference, we employed low-precision quantization and adapted the width of the SSD CNN model. To improve throughput, we use dual-data rate operations for DSPs to effectively double the throughput with limited DSP counts. For different SSD algorithm models, we analyze accuracy or mean average precision (mAP) and evaluate the corresponding FPGA hardware utilization, DRAM communication, throughput optimization. We evaluated the FPGA hardware for a custom drone dataset, Pascal VOC and COCO2017. Our proposed design achieves a high mAP of 88.42% on the multi-drone dataset, with a high energy-efficiency of 79 GOPS/W and throughput of 158 GOPS using Xilinx Zynq ZU3EG FPGA device on the Open Vision Computer version 3 (OVC3) platform. Our design achieves 1.1-8.7 × higher energy efficiency than prior works which used the same Pascal VOC dataset, using the same FPGA device, but at a low-power consumption of 2.54 W. For COCO dataset, our MobileNet-V1 implementation achieved mAP of 16.8, and 4.9 FPS/W for energy-efficiency, which is ∼ 1.9X higher than prior FPGA works or other commercial hardware platforms.
computer science, hardware & architecture
What problem does this paper attempt to address?