Abstract:Long Short-Term Memory (LSTM) networks have been widely used to solve sequence modeling problems. For researchers, using LSTM networks as the core and combining it with pre-processing and post-processing to build complete algorithms is a general solution for solving sequence problems. As an ideal hardware platform for LSTM network inference, Field Programmable Gate Array (FPGA) with low power consumption and low latency characteristics can accelerate the execution of algorithms. However, implementing LSTM networks on FPGA requires specialized hardware and software knowledge and optimization skills, which is a challenge for researchers. To reduce the difficulty of deploying LSTM networks on FPGAs, we propose F-LSTM, an FPGA-based framework for heterogeneous computing. With the help of F-LSTM, researchers can quickly deploy LSTM-based algorithms to heterogeneous computing platforms. FPGA in the platform will automatically take up the computation of the LSTM network in the algorithm. At the same time, the CPU will perform the pre-processing and post-processing in the algorithm. To better design the algorithm, compress the model, and deploy the algorithm, we also propose a framework based on F-LSTM. The framework also integrates Pytorch to increase usability. Experimental results on sentiment analysis tasks show that deploying algorithms to the F-LSTM hardware platform can achieve a 1.8× performance improvement and a 5.4× energy efficiency improvement compared to GPU. Experimental results also validate the need to build heterogeneous computing systems. In conclusion, our work reduces the difficulty of deploying LSTM on FPGAs while guaranteeing algorithm performance compared to traditional work.

A Cloud Server Oriented FPGA Accelerator for LSTM Recurrent Neural Network

Cloud Server Oriented FPGA Accelerator for Long Short-Term Memory Recurrent Neural Networks

A Near Memory Computing FPGA Architecture for Neural Network Acceleration

FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks

An LSTM Acceleration Engine for FPGAs Based on Caffe Framework

Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

An Fpga-Based Lstm Acceleration Engine For Deep Learning Frameworks

FPGA Acceleration of LSTM Based on Data for Test Flight.

An Instruction-Driven Batch-Based High-Performance Resource-Efficient LSTM Accelerator on FPGA

F-LSTM: FPGA-Based Heterogeneous Computing Framework for Deploying LSTM-Based Algorithms

A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization

ConvCloud: an Adaptive Convolutional Neural Network Accelerator on Cloud FPGAs.

FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks

A Deep Learning Prediction Process Accelerator Based FPGA

A Spiking LSTM Accelerator for Automatic Speech Recognition Application Based on FPGA

FPGA Acceleration of Recurrent Neural Network Based Language Model

A Data-Center FPGA Acceleration Platform for Convolutional Neural Networks

FPGA Implementation of LSTM Based on Automatic Speech Recognition

Efficient Weight Reuse for Large LSTMs.

A Memory-Optimized and Energy-Efficient CNN Acceleration Architecture Based on FPGA.