Approximate Computing for Long Short Term Memory (LSTM) Neural Networks

Sanchari Sen,Anand Raghunathan
DOI: https://doi.org/10.1109/tcad.2018.2858362
2018-11-01
Abstract:Long Short Term Memory (LSTM) networks are a class of recurrent neural networks that are widely used for machine learning tasks involving sequences, including machine translation, text generation, and speech recognition. Large-scale LSTMs, which are deployed in many real-world applications, are highly compute intensive. To address this challenge, we propose AxLSTM, an application of approximate computing to improve the execution efficiency of LSTMs. An LSTM is composed of cells, each of which contains a cell state along with multiple gating units that control the addition and removal of information from the state. The LSTM execution proceeds in timesteps, with a new symbol of the input sequence processed at each timestep. AxLSTM consists of two techniques—Dynamic Timestep Skipping (DTS) and Dynamic State Reduction (DSR). DTS identifies, at runtime, input symbols that are likely to have little or no impact on the cell state and skips evaluating the corresponding timesteps. In contrast, DSR reduces the size of the cell state in accordance with the complexity of the input sequence, leading to a reduced number of computations per timestep. We describe how AxLSTM can be applied to the most common application of LSTMs, viz., sequence-to-sequence learning. We implement AxLSTM within the TensorFlow deep learning framework and evaluate it on 3 state-of-the-art sequence-to-sequence models. On a 2.7 GHz Intel Xeon server with 128 GB memory and 32 processor cores, AxLSTM achieves $ {1.08\times -1.31 \times }$ speedups with minimal loss in quality, and $ {1.12 \times -1.37 \times }$ speedups when moderate reductions in quality are acceptable.
engineering, electrical & electronic,computer science, interdisciplinary applications, hardware & architecture
What problem does this paper attempt to address?