A Compact and Configurable Long Short-Term Memory Neural Network Hardware Architecture.

Kewei Chen,Leilei Huang,Minjiang Li,Xiaoyang Zeng,Yibo Fan
DOI: https://doi.org/10.1109/icip.2018.8451053
2018-01-01
Abstract:Neural network has been one of the most useful techniques in the area of image analysis and speech recognition in recent years. Long Short-Term Memory (LSTM), a popular type of recurrent neural networks (RNNs), has widely been implemented on CPUs and GPUs. However, software implementation cannot offer large parallelism for the complicated computation of LSTM, and most of the LSTM hardware implementations proposed yet are intensive and non-configurable. In order to accelerate the computation and reduce the resources consumption, in this work, we present a compact and configurable LSTM neural network hardware architecture. To meet the requirements of different networks, we set a wide array of hardware parameters that can be configured to balance area, power and performance. And we adopt the second-order polynomial to approximate the activation functions in LSTM, which balances the computational accuracy and resource utilization, Implemented on XCZU6EG FPGA running at 238 MHz, our work has a performance of 7.64 GOP/s. Compared to the implementation on Intel Xeon E5-2620 CPU at 2.10 GHz, our parallel hardware architecture achieves 90× speedup for a small network and 25 x speed-up for a large one. The total consumption of resources is 77% less than the state-of-the-art works', which implies the compactness of our work.
What problem does this paper attempt to address?