A Memory-Efficient Hardware Architecture for Deformable Convolutional Networks

Yue Yu,Jiapeng Luo,Wendong Mao,Zhongfeng Wang
DOI: https://doi.org/10.1109/sips52927.2021.00033
2021-10-01
Abstract:In recent years, deformable convolutional networks are widely adopted in object detection tasks and have achieved outstanding performance. Compared with conventional convolution, the deformable convolution has an irregular receptive field to adapt to objects with different sizes and shapes. However, the irregularity of the receptive field causes inefficient access to memory and increases the complexity of control logic. Toward hardware-friendly implementation, prior works change the characteristics of deformable convolution by restricting the receptive field, leading to accuracy degradation. In this paper, we develop a dedicated Sampling Core to sample and rearrange the input pixels, enabling the convolution array to access the inputs regularly. In addition, a memory-efficient dataflow is introduced to match the processing speed of the Sampling Core and convolutional array, which improves hardware utilization and reduces access to off-chip memory. Based on these optimizations, we propose a novel hardware architecture for the deformable convolution network, which is the first work to accelerate the original deformable convolution network. With the design of the memory-efficient architecture, the access to the off-chip memory is reduced significantly. We implement it on Xilinx Virtex-7 FPGA, and experiments show that the energy efficiency reaches 50.29 GOPS/W, which is 2.5 times higher compared with executing the same network on GPU.
What problem does this paper attempt to address?