An Efficient Hardware Implementation of Dilated Convolution Using a Novel Channel-Equivalent Decomposition Method.

Yuheng Xia,Yishuo Meng,Siwei Xiang,Jianfei Wang,Chen Yang
DOI: https://doi.org/10.1109/ICTA60488.2023.10364302
2023-01-01
Abstract:This paper proposes an efficient hardware deployment method for dilated convolution, and designs a novel channel-equivalent decomposition method for dilated convolution to fully utilize the effective data in the dilated convolution, significantly reducing the redundant zero operation requirement in the dilated convolution, greatly reducing memory bandwidth requirements, and improving throughput. The efficient dilated convolution operation unit is implemented on the Xilinx ZCU102 FPGA, with a working frequency of 333MHz. The test results indicate that the performance of running the dilated convolution layer of VGG-SSD is 192GOPS. At a power consumption of 2.1W, the average energy efficiency of the system is 91.4GOPS/W.
What problem does this paper attempt to address?