A High Energy-Efficiency FPGA-Based LSTM Accelerator Architecture Design by Structured Pruning and Normalized Linear Quantization

Yong Zheng,Haigang Yang,Zhihong Huang,Tianli Li,Yiping Jia
DOI: https://doi.org/10.1109/ICFPT47387.2019.00045
2019-01-01
Abstract:LSTM (Long Short-Term Memory) is an artificial recurrent neural network (RNN) architecture and has been successfully applied to the areas where sequences of data need to be dealt with such as Natural Language Processing (NLP), speech recognition, etc. In this work, we explore an avenue to minimization of the LSTM inference part design based on FPGA for high performance and energy-efficiency. First, the model is pruned to create structured sparse features for the hardware-friendly purpose by using permuted block diagonal mask matrices. To further compress the model, we quantize the weights and activations following a normalized linear quantization approach. As a result, computational activities of the network are significantly deducted with an egligible loss on accuracy. Then a hardware architecture design has been devised to fully exploit the benefits of regular sparse structure. Having been implemented on Arria 10 (10AX115U4F45I3SG) FPGA running at 150 MHz, our accelerator demonstrates a peak performance of 2.22 TOPS at a power dissipation of 1.679 Watts. In comparison to the other FPGA-based LSTM accelerator designs previously reported, our approach achieves a 1.17-2.16x speedup in processing.
What problem does this paper attempt to address?