Subword language modeling with neural networks

Tomáš Mikolov, Ilya Sutskever, Anoop Deoras, Hai-Son Le, Stefan Kombrink, Jan Cernocky
2012-01-01
Abstract:We explore the performance of several types of language models on the word-level and the character-level language modeling tasks. This includes two recently proposed recurrent neural network architectures, a feedforward neural network model, a maximum entropy model and the usual smoothed n-gram models. We then propose a simple technique for learning sub-word level units from the data, and show that it combines advantages of both character and wordlevel models. Finally, we show that neural network based language models can be order of magnitude smaller than compressed n-gram models, at the same level of performance when applied to a Broadcast news RT04 speech recognition task. By using sub-word units, the size can be reduced even more.
What problem does this paper attempt to address?