Abstract:A neural network accelerated optimization method for FPGA hardware platform is proposed. The method realizes the optimized deployment of neural network algorithms for FPGA hardware platforms from three aspects: computational speed, flexible transplantation, and development methods. Replacing multiplication based on Mitchell algorithm not only breaks through the speed bottleneck of neural network hardware acceleration caused by long multiplication period, but also makes the parallel acceleration of neural network algorithm get rid of the dependence on the number of hardware multipliers in FPGA, which can give full play to the advantages of FPGA parallel acceleration and maximize the computing speed. Based on the configurable strategy of neural network parameters, the number of network layers and nodes within layers can be adjusted according to different logical resource of FPGA, improving the flexibility of neural network transplantation. The adoption of HLS development method overcomes the shortcomings of RTL method in designing complex neural network algorithms, such as high difficulty in development and long development cycle. Using the Cyclone V SE 5CSEBA6U23I7 FPGA as the target device, a parameter configurable BP neural network was designed based on the proposed method. The usage of logical resources such as ALUT, Flip-Flop, RAM, and DSP were 39.6%, 40%, 56.9%, and 18.3% of the pre-optimized ones, respectively. The feasibility of the proposed method was verified using MNIST digital recognition and facial recognition as application scenarios. Compare to pre-optimization, the test time of MNIST number recognition is reduced to 67.58%, and the success rate was lost 0.195%. The test time for facial recognition applications was reduced to 69.571%, and the success rate of combining LDA algorithm was lost within 4%.

Neural network accelerator for bit width partitioning and implementation method of neural network accelerator

Leveraging Bit-Serial Architectures for Hardware-Oriented Deep Learning Accelerators with Column-Buffering Dataflow

A High Performance Multi-Bit-Width Booth Vector Systolic Accelerator for NAS Optimized Deep Learning Neural Networks

Parallel Hybrid Stochastic-Binary-Based Neural Network Accelerators

Reconfigurable neural network acceleration method and architecture

A Data-Driven Asynchronous Neural Network Accelerator

A Fine-Grained Sparse Accelerator for Multi-Precision DNN.

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

A Low-Power Sparse Convolutional Neural Network Accelerator with Pre-Encoding Radix-4 Booth Multiplier

A fine-grained mixed precision DNN accelerator using a two-stage big-little core RISC-V MCU.

DeepBurning-MixQ: An Open Source Mixed-Precision Neural Network Accelerator Design Framework for FPGAs

Bit-width Adaptive Accelerator Design for Convolution Neural Network

Efficient Hardware Optimization Strategies For Deep Neural Networks Acceleration Chip

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks

Neural Synaptic Plasticity-Inspired Computing: A High Computing Efficient Deep Convolutional Neural Network Accelerator

Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance Optimization

A neural network accelerated optimization method for FPGA

Design of a Convolutional Neural Network Accelerator Based on On-Chip Data Reordering

Ifpna: A Flexible and Efficient Deep Neural Network Accelerator with a Programmable Data Flow Engine in 28nm CMOS.

Separable array-based reconfigurable accelerator and realization method thereof

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks