Learning Sparse Hidden States In Long Short-Term Memory

Niange Yu,Cornelius Weber,Xiaolin Hu
DOI: https://doi.org/10.1007/978-3-030-30484-3_24
2019-01-01
Abstract:Long Short-Term Memory (LSTM) is a powerful recurrent neural network architecture that is successfully used in many sequence modeling applications. Inside an LSTM unit, a vector called "memory cell" is used to memorize the history. Another important vector, which works along with the memory cell, represents hidden states and is used to make a prediction at a specific step. Memory cells record the entire history, while the hidden states at a specific time step in general need to attend only to very limited information thereof. Therefore, there exists an imbalance between the huge information carried by a memory cell and the small amount of information requested by the hidden states at a specific step. We propose to explicitly impose sparsity on the hidden states to adapt them to the required information. Extensive experiments show that sparsity reduces the computational complexity and improves the performance of LSTM networks (The source code is available at https://github.com/feiyuhug/SHS_LSTM/tree/master).
What problem does this paper attempt to address?