Reducing the Computational Complexity of Two-Dimensional LSTMs

Bo Li,Tara N. Sainath
DOI: https://doi.org/10.21437/Interspeech.2017-1164
2017-08-20
Abstract:Long Short-Term Memory Recurrent Neural Networks (LSTMs) are good at modeling temporal variations in speech recognition tasks, and have become an integral component of many state-of-the-art ASR systems. More recently, LSTMs have been extended to model variations in the speech signal in two dimensions, namely time and frequency [1, 2]. However, one of the problems with two-dimensional LSTMs, such as Grid-LSTMs, is that the processing in both time and frequency occurs sequentially, thus increasing computational complexity. In this work, we look at minimizing the dependence of the Grid-LSTM with respect to previous time and frequency points in the sequence, thus reducing computational complexity. Specifically, we compare reducing computation using a bidirectional Grid-LSTM (biGrid-LSTM) with non-overlapping frequency sub-band processing, a PyraMiD-LSTM [3] and a frequency-block Grid-LSTM (fbGrid-LSTM) for parallel time-frequency processing. We find that the fbGrid-LSTM can reduce computation costs by a factor of four with no loss in accuracy, on a 12,500 hour Voice Search task.
Computer Science
What problem does this paper attempt to address?