Training recurrent neural networks

Ilya Sutskever
2013-01-01
Abstract:Recurrent Neural Networks (RNNs) are artificial neural network models that are well-suited for pattern classification tasks whose inputs and outputs are sequences. The importance of developing methods for mapping sequences to sequences is exemplified by tasks such as speech recognition, speech synthesis, named-entity recognition, language modelling, and machine translation. An RNN represents a sequence with a high-dimensional vector (called the hidden state) of a fixed dimensionality that incorporates new observations using an intricate nonlinear function. RNNs are highly expressive and can implement arbitrary memory-bounded computation, and as a result, they can likely be configured to achieve nontrivial performance on difficult sequence tasks. However, RNNs have turned out to be difficult to train, especially on problems with complicated long-range temporal structure–precisely the setting where RNNs ought to be most useful. Since their potential has not been realized, methods that address the difficulty of training RNNs are of great importance. We became interested in RNNs when we sought to extend the Restricted Boltzmann Machine (RBM; Smolensky, 1986), a widely-used density model, to sequences. Doing so was worthwhile because RBMs are not well-suited to sequence data, and at the time RBM-like sequence models did not exist. We introduced the Temporal Restricted Boltzmann Machine (TRBM; Sutskever, 2007; Sutskever and Hinton, 2007) which could model highly complex sequences, but its parameter update required the use of crude approximations, which was unsatisfying. To address this issue, we modified …
What problem does this paper attempt to address?