Implementation and Optimization of the Accelerator Based on FPGA Hardware for LSTM Network

Yiwei Zhang,Chao Wang,Lei Gong,Yuntao Lu,Fan Sun,Chongchong Xu,Xi Li,Xuehai Zhou
DOI: https://doi.org/10.1109/ispa/iucc.2017.00098
2017-01-01
Abstract:Today, artificial neural networks (ANNs) are important machine learning methods which are widely used in a variety of applications. As the emerging field of ANNs, recurrent neural networks (RNNs) are often used for sequencerelated applications. And Long Short-Term Memory (LSTM) is an improved RNN which contains complex computational logic. To achieve high accuracy, researchers always build largescale LSTM networks which are time-consuming and powerconsuming. Thus the acceleration of LSTM networks, low power & energy consumption become the hot issues in today's research. In this paper, we present a hardware accelerator for the LSTM neural network layer based on FPGA Zedboard and use pipeline methods to parallelize the forward computing process. To optimize our implementation, we also use multiple methods including tiled matrix-vector multiplication, binary adder tree, and overlap of computation and data access. Through the acceleration and optimization methods, our accelerator is power-efficient and has a better performance than ARM Cortex A9 processor and Intel Core i5 processor.
What problem does this paper attempt to address?