O3NMS: an Out-Of-Order-Based Low-Latency Accelerator for Non-Maximum Suppression

Yuzhou Chen,Jinming Zhang,Dongxu Lv,Xi Yu,Guanghui He
DOI: https://doi.org/10.1109/iscas46773.2023.10181731
2023-01-01
Abstract:Non-maximum suppression (NMS) is an important post-processing method to eliminate overlapping bounding boxes in object detection neural networks. Suffering from quadratic computational complexity and frequent memory access, NMS has become a bottleneck of detection latency. To deal with this problem, we propose out-of-order NMS (O 3 NMS), a hardware- software co-optimization approach to reduce latency as well as area overhead of NMS accelerator. In order to reduce startup latency, we devise the O3NMS algorithm that removes pre-sort operation. To efficiently support O 3 NMS algorithm, we design a specialized hardware accelerator. Our design has been implemented in both Xilinx FPGA and SIMC 40nm technology. Experiments demonstrate O 3 NMS accelerator achieves 2.51 x speedup as well as 37 % reduction in FPGA source utilization compared with the state-of-the-art (SOTA) NMS accelerator.
What problem does this paper attempt to address?