E-LSTM: an Efficient Hardware Architecture for Long Short-Term Memory

Meiqi Wang,Zhisheng Wang,Jinming Lu,Jun Lin,Zhongfeng Wang
DOI: https://doi.org/10.1109/jetcas.2019.2911739
IF: 5.877
2019-01-01
IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Abstract:Long Short-Term Memory (LSTM) and its variants have been widely adopted in many sequential learning tasks, such as speech recognition and machine translation. Significant accuracy improvements can be achieved using complex LSTM model with a large memory requirement and high computational complexity, which is time-consuming and energy demanding. The low-latency and energy-efficiency requirements of the real-world applications make model compression and hardware acceleration for LSTM an urgent need. In this paper, several hardware-efficient network compression schemes are introduced first, including structured top- $k$ pruning, clipped gating, and multiplication-free quantization, to reduce the model size and the number of matrix operations by 32 $\times $ and 21.6 $\times $ , respectively, with negligible accuracy loss. Furthermore, efficient hardware architectures for accelerating the compressed LSTM are proposed, which support the inference of multi-layer and multiple time steps. The computation process is judiciously reorganized and the memory access pattern is well optimized, which alleviate the limited memory bandwidth bottleneck and enable higher throughput. Moreover, the parallel processing strategy is carefully designed to make full use of the sparsity introduced by pruning and clipped gating with high hardware utilization efficiency. Implemented on Intel Arria10 S $\times $ 660 FPGA running at 200MHz, the proposed design is able to achieve 1.4–2.2 $\times $ energy efficiency and requires significantly less hardware resources compared with the state-of-the-art LSTM implementations.
What problem does this paper attempt to address?