Hardware Implementation of Depthwise Separable Convolution Neural Network

Yancao Jiang,Jie Ren,Xiang Xie,Chun Zhang
DOI: https://doi.org/10.1109/icsict49897.2020.9278300
2020-01-01
Abstract:In this paper, an efficient architecture of depthwise separable convolutional neural network is presented. By simplifying the network architecture and adopting 16-bit fixed point form, the computation cost is substantially reduced with slightly decrease in image classification precision based on CIFAR10 dataset. In order to improve energy efficiency and reduce memory access, the custom processing element (PE) is proposed, which supports zero skipping, local accumulation and memory, as well as function multiplexing (for different convolution operations). Besides, the specific hardware architecture based on our custom PE is proposed and the hardware architecture can also support two dataflow modes in order to customize traditional convolution and depthwise separable convolution (DWC) dataflow. The hardware architecture is implemented on a Xilinx Zynq 7Z020 field-programmable gate array (FPGA) platform and the experimental results are implemented. By exploiting parallelism and data reuse, post synthesis simulation with a clock frequency of 50 MHz shows that the network achieves a peak performance of 9.6 GOPS and an energy efficiency of over 90.56 GOPS/W for single-frame runtime inference, achieving a 59× higher energy efficiency compared with the CPU Intel i5-8400. The results show that the proposed accelerator can classify each picture from Cifar10 in 10 ms, which is about 100 frames per second.The FPGA design achieves 41x speedup if compared to CPU, achieving real-time image classification at 100 fps.
What problem does this paper attempt to address?