Exploration for Efficient Depthwise Separable Convolution Networks Deployment on FPGA

Zhijie Huang,Ao Qie,Chen Zhang,Jie Yang,Xin'an Wang
DOI: https://doi.org/10.1109/aicas59952.2024.10595964
2024-01-01
Abstract:Depthwise Separable Convolution (DSC) has become the key structure in lightweight convolutional neural networks. However, the tight connection between network structure and hardware architecture has not received enough attention, resulting in inefficient hardware performance and resource utilization. In this paper, we propose a software-hardware co-design framework to explore the efficient hardware deployment for DSC networks. Begin with the analysis of network structure, the proposed framework employs pointwise-depthwise convolution layer-chaining to reduce on-chip memory. Furthermore, a multi-objective optimization mathematical model is proposed to explore the optimal architecture that balances both performance and resource utilization. Additionally, we optimize the hardware design with reconfigurable line buffers, DSP multiplications optimization, and deploying standard-depthwise convolution pipeline computation. Compared with reference designs, the simulation results show the proposed framework can at least reduce computation time by 19.1% and improve the DSP utilization by 27.5%. MobileNetV2 is implemented on Xilinx XC7Z020 FPGA, the proposed design achieves the best computation resource efficiency of 0.58 GOPS/DSP, with on-chip memory of only 3.2 Mb.
What problem does this paper attempt to address?