An LSTM Acceleration Engine for FPGAs Based on Caffe Framework
Junhua He,Dazhong He,Yang Yang,Jun Liu,Jie Yang,Siye Wang
DOI: https://doi.org/10.1109/ICCC47050.2019.9064358
2019-01-01
Abstract:Recently, Long Short Term Memory (LSTM) networks have been widely used in sequence-related problem. LSTMs outperform conventional feed-forward neural networks and RNNs in many ways, since they remember patterns selectively for long durations of time. However, due to the recurrent property of LSTMs, it is hard to implement a high computing parallelism on general processors such as CPUs and GPUs. Besides, the huge energy consumption of GPU and CPU computing is a non-negligible issue for data centers. In order to solve the problems above, FPGA emerges as an ideal solution. It has the characteristics of low power and latency, which has natural advantages for the implementation of recurrent neural networks, such as LSTMs. In this paper, we propose to implement an acceleration engine for LSTM network based on FPGAs. By employing fixed-point arithmetic, systolic arrays for matrix multiplication and look up table for activate function, we optimize the LSTM on FPGA in depth. Additionally, we integrate the acceleration engine into Caffe, one of the most popular deep learning framework, to make it easier to deploy. According to the experimental results, our acceleration engine achieves 8.8X and 2.2X gains for performance, 16.9X and 9.6X gains for energy efficiency compared with CPU and GPU, respectively.