Abstract:Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrix-based framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement. We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4$\times$ compared with ESE, and more than 2$\times$ compared with C-LSTM, under the same accuracy.

Recurrent Neural Networks Hardware Implementation on FPGA

The implementation of a Deep Recurrent Neural Network Language Model on a Xilinx FPGA

A Compact and Configurable Long Short-Term Memory Neural Network Hardware Architecture.

A Hardware Implementation of SNN-Based Spatio-Temporal Memory Model.

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs

FPGA Acceleration of Recurrent Neural Network Based Language Model

An Efficient Reconfigurable Framework for General Purpose CNN-RNN Models on FPGAs

FPGA-based Accelerator for Long Short-Term Memory Recurrent Neural Networks

FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks

A General Neural Network Hardware Architecture on FPGA

Hardware implementation of spiking neural networks on FPGA

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network

Efficient Recurrent Neural Networks using Structured Matrices in FPGAs

Accelerating RNNs on FPGA with HBM

A Highly Configurable 7.62gop/s Hardware Implementation for LSTM

Spike Trains Encoding Optimization for Spiking Neural Networks Implementation in FPGA

Accurate Mapping of RNNs on Neuromorphic Hardware with Adaptive Spiking Neurons

Deep neural network accelerator based on FPGA

A Power-Efficient Accelerator Based on FPGAs for LSTM Network

An FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification