A 0.57-GOPS/DSP Object Detection PIM Accelerator on FPGA

Bo Jiao,Jinshan Zhang,Yuanyuan Xie,Shunli Wang,Haozhe Zhu,Xiaoyang Kang,Zhiyan Dong,Lihua Zhang,Chixiao Chen
DOI: https://doi.org/10.1145/3394885.3431659
2021-01-01
Abstract:The paper presents an object detection accelerator featuring a processing-in-memory (PIM) architecture on FPGAs. PIM architectures are well known for their energy efficiency and avoidance of the memory wall. In the accelerator, a PIM unit is developed using BRAM and LUT based counters, which also helps to improve the DSP performance density. The overall architecture consists of 64 PIM units and three memory buffers to store inter-layer results. A shrunk and quantized Tiny-YOLO network is mapped to the PIM accelerator, where DRAM access is fully eliminated during inference. The design achieves a throughput of 201.6 GOPs at 100MHz clock rate and correspondingly, a performance density of 0.57 GOPS/DSP.
What problem does this paper attempt to address?