Abstract:A neural network accelerated optimization method for FPGA hardware platform is proposed. The method realizes the optimized deployment of neural network algorithms for FPGA hardware platforms from three aspects: computational speed, flexible transplantation, and development methods. Replacing multiplication based on Mitchell algorithm not only breaks through the speed bottleneck of neural network hardware acceleration caused by long multiplication period, but also makes the parallel acceleration of neural network algorithm get rid of the dependence on the number of hardware multipliers in FPGA, which can give full play to the advantages of FPGA parallel acceleration and maximize the computing speed. Based on the configurable strategy of neural network parameters, the number of network layers and nodes within layers can be adjusted according to different logical resource of FPGA, improving the flexibility of neural network transplantation. The adoption of HLS development method overcomes the shortcomings of RTL method in designing complex neural network algorithms, such as high difficulty in development and long development cycle. Using the Cyclone V SE 5CSEBA6U23I7 FPGA as the target device, a parameter configurable BP neural network was designed based on the proposed method. The usage of logical resources such as ALUT, Flip-Flop, RAM, and DSP were 39.6%, 40%, 56.9%, and 18.3% of the pre-optimized ones, respectively. The feasibility of the proposed method was verified using MNIST digital recognition and facial recognition as application scenarios. Compare to pre-optimization, the test time of MNIST number recognition is reduced to 67.58%, and the success rate was lost 0.195%. The test time for facial recognition applications was reduced to 69.571%, and the success rate of combining LDA algorithm was lost within 4%.

DIF-LUT: A Simple Yet Scalable Approximation for Non-Linear Activation Function on FPGA

Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic Reconfiguration

A Reconfigurable Approximate Multiplier for Quantized CNN Applications.

LUTNet: Rethinking Inference in FPGA Soft Logic

NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

FHAM: FPGA-based High-Efficiency Approximate Multipliers Via LUT Encoding

QUADOL: A Quality-Driven Approximate Logic Synthesis Method Exploiting Dual-Output LUTs for Modern FPGAs

LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference

Explore Efficient LUT-based Architecture for Quantized Convolutional Neural Networks on FPGA

INA: Incremental Network Approximation Algorithm for Limited Precision Deep Neural Networks

DALTA: A Decomposition-based Approximate Lookup Table Architecture

First hydroxamate inhibitors for carboxypeptidase A. N-acyl-N-hydroxy-beta-phenylalanines.

LUT‐DSP usage trade‐off for re‐configurable convolution acceleration core based on small logarithmic floating point representation

PolyLUT-Add: FPGA-based LUT Inference with Wide Inputs

A Novel Approximation Methodology and Its Efficient VLSI Implementation for the Sigmoid Function.

FPGA Implementation for the Sigmoid with Piecewise Linear Fitting Method Based on Curvature Analysis

A neural network accelerated optimization method for FPGA

An All-Digital Compute-In-Memory FPGA Architecture for Deep Learning Acceleration

FPGA-Based Convolutional Neural Network Accelerator with Resource-Optimized Approximate Multiply-Accumulate Unit