FPGA Hardware Acceleration Design for Deep Learning

Haochen Shi
DOI: https://doi.org/10.54097/hset.v39i.6543
2023-04-01
Abstract:A type of artificial neural network called a convolutional neural network (CNN) can learn characteristics from a huge amount of data and performs very well in the field of large-scale image processing. CNN simulates the behavior of a biological optic nerve. In recent years, with the development of deep neural network algorithms and hardware technology, the current "CPU+GPU" model servers cannot meet the neural network structure in various fields, so a large amount of deep CNN accelerators based on the FPGA platform have gradually emerged. FPGA is beginning to be used in the fields of image recognition and natural language processing because of its programmability, high performance, high stability, high security, and low power consumption. Though FPGA has proven to have better performance, there is still room for optimization at the design level. Yolov3, as a classical algorithm, still consumes a lot of time and computational resources in actual operations. To address this problem, this experiment partially optimizes the Yolov3 algorithm by introducing the CBAM attention mechanism in the Yolov3 model and pruning the embedded system with different proportions using the Network slimming method. Finally, it is verified on a TX2 embedded device developed by Nvidia using the COCO dataset. The experiment finds that the precision, mAP, and the number of parameters of the optimized Yolov3 algorithm under different optimization strategies. It is shown that the Yolov3 algorithm still has more optimization strategies that can reduce the time required for computation and the memory occupied more effectively without any degradation in accuracy.
Computer Science,Engineering
What problem does this paper attempt to address?