Temporal Kernel Neural Network Language Model

YongZhe Shi,Wei-Qiang Zhang,Meng Cai,Jia Liu
DOI: https://doi.org/10.1109/icassp.2013.6639273
2013-01-01
Abstract:Using neural networks to estimate the probabilities of word sequences has shown significant promise for statistical language modeling. Typical modeling methods include multi-layer neural networks, log-bilinear networks and recurrent neural networks, etc. In this paper, we propose the temporal kernel neural network language model, a variant of models mentioned above. This model explicitly captures long-term dependencies of words with exponential kernel, where the memory of history is decayed exponentially. Additionally, several sentences with variable lengths as a mini-batch are efficiently implemented for speeding up. Experimental results show that the proposed model is very competitive to the recurrent neural network language model and obtains the lower perplexity of 111.6 (more than 10% reduction) than the state-of-the-art results reported in the standard Penn Treebank Corpus. We further apply this model to Wall Street Journal speech recognition task, and observe significant improvements in word error rate.
What problem does this paper attempt to address?