A Parallel Processing CNN Accelerator on Embedded Devices Based on Optimized MobileNet

Dingkun Yang,Zhiyong Luo
DOI: https://doi.org/10.1109/jiot.2023.3277869
IF: 10.6
2023-01-01
IEEE Internet of Things Journal
Abstract:In the field of machine vision and pattern recognition, the convolutional neural network (CNN) is one of the hottest research topics. However, the application of CNN, which requires complicated operations, appears to be exceptionally difficult in resource-constrained embedded devices. In this article, a parallel processing CNN accelerator based on optimized MobileNet is presented. By modifying the fully connected layer of the MobileNet network topology with the convolution process, and postponing the global pooling layer, the model topology is unified, which is conducive to the design of the hardware accelerator. After using the 8-bit quantization strategy of network model parameters, the process of depthwise separable convolution is accelerated by parallel processing between channels and pipelined processing between layers. Thus, the processing speed and throughput of the accelerator can be improved. The designed accelerator classification performance on ImageNet achieved 580.6 frames per second (fps) on a ZYNQ AXZU5EV platform and a system power consumption of only 6.51 W. This result represents a $22.3\times $ speedup compared to CPU and a $1.7\times $ speedup compared to graphics processing unit (GPU), while the design has a lower power consumption than CPU and GPU, providing a reference for the application of CNN in embedded devices.
What problem does this paper attempt to address?