Network Structure Optimization and High-Efficiency Implementation of Skynet Based on FPGA

TANG Wei-wei,ZHONG Sheng,LU Jin-yi,YAN Lu-xin,TAN Fu-zhong,ZHOU Xu,XU Wen-hui
DOI: https://doi.org/10.12263/dzxb.20210028
2023-01-01
Abstract:The object detection algorithm based on convolutional neural network(CNN) has the advantages of strong robustness and high accuracy, and is widely used in the field of computer vision tasks. However, the size of CNN parameters and the amount of calculation make it difficult to implement in real-time on edge computing platforms. For this reason, this paper optimizes the structure of the object detection network Skynet, and realizes on the field programmable logic gate array(FPGA) based on an efficient intra-layer parallel pipeline acceleration architecture. This method prunes skynet, merges its convolutional layer and normalization layer, uses the(KL) relative entropy method and maximum quantization method to perform 8 bit fixed-point quantization on the weights and feature maps, and converts bias and scaling coefficients into fixed point, then merges the activation operation and saturation truncation operation for speeding up the CNN forward calculation. In addition, this paper optimizes serial structure to pipeline parallel structure based on the sliding window operation, parallelizes channel and pixel calculation, then designs a pipeline strategy for depthwise separable convolution, which greatly reduces time to forward calculation. Experiments show that on the UA-DETRAC dataset, the method recognition accuracy of this paper is 0.752, and the frame rate reaches 115FPS at an image resolution of 160×160,which is 11 times faster than the CPU and reaches 75% of the GPU. The power is reduced to 10.6% of the CPU and 7.43%of the GPU. Moreover, the proposed method has the best performance in both speed and energy efficiency ratio by comparing with the similar CNN acceleration methods based on FPGA.
What problem does this paper attempt to address?