Abstract:A type of artificial neural network called a convolutional neural network (CNN) can learn characteristics from a huge amount of data and performs very well in the field of large-scale image processing. CNN simulates the behavior of a biological optic nerve. In recent years, with the development of deep neural network algorithms and hardware technology, the current "CPU+GPU" model servers cannot meet the neural network structure in various fields, so a large amount of deep CNN accelerators based on the FPGA platform have gradually emerged. FPGA is beginning to be used in the fields of image recognition and natural language processing because of its programmability, high performance, high stability, high security, and low power consumption. Though FPGA has proven to have better performance, there is still room for optimization at the design level. Yolov3, as a classical algorithm, still consumes a lot of time and computational resources in actual operations. To address this problem, this experiment partially optimizes the Yolov3 algorithm by introducing the CBAM attention mechanism in the Yolov3 model and pruning the embedded system with different proportions using the Network slimming method. Finally, it is verified on a TX2 embedded device developed by Nvidia using the COCO dataset. The experiment finds that the precision, mAP, and the number of parameters of the optimized Yolov3 algorithm under different optimization strategies. It is shown that the Yolov3 algorithm still has more optimization strategies that can reduce the time required for computation and the memory occupied more effectively without any degradation in accuracy.

FPGA based Flexible Implementation of Light Weight Inference on Deep Convolutional Neural Networks

FPGA Oriented Lightweight Deep Learning Inference for Liver Cancer Segmentation

Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA

Deploying deep learning networks based advanced techniques for image processing on FPGA platform

A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA

A flexible FPGA accelerator for convolutional neural networks

Design of Convolutional Neural Network Based on FPGA

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA

An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs

Efficient Hardware Architectures for Deep Convolutional Neural Network

FPGA Hardware Acceleration Design for Deep Learning

Accelerating Low Bit-Width Convolutional Neural Networks with Embedded FPGA.

FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary Flow

FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs

Systematic realization of a fully connected deep and convolutional neural network architecture on a field programmable gate array

Towards Enabling Dynamic Convolution Neural Network Inference for Edge Intelligence

Deep neural network accelerator based on FPGA

A High Performance Reconfigurable Hardware Architecture for Lightweight Convolutional Neural Network

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs