Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.
Xie Chen,Xunying Liu,Yu Wang,Anton Ragni,Jeremy H. M. Wong,Mark J. F. Gales
DOI: https://doi.org/10.1109/TASLP.2019.2922048
2019-01-01
Abstract:
Language modeling is a crucial component in a wide range of applications including speech recognition. Language models LMs are usually constructed by splitting a sentence into words and computing the probability of a word based on its word history. This sentence probability calculation, making use of conditional probability distributions, assumes that there is little impact from approximations used in the LMs, including the word history representations and finite training data. This motivates examining models that make use of additional information from the sentence. In this paper, future word information, in addition to the history, is used to predict the probability of the current word. For recurrent neural network LMs RNNLMs, this information can be encapsulated in a bi-directional model. However, if used directly, this form of model is computationally expensive when trained on large quantities of data, and can be problematic when used with word lattices. This paper proposes a novel neural network language model structure, the succeeding-word RNNLM, su-RNNLM, to address these issues. Instead of using a recurrent unit to capture the complete future word contexts, a feedforward unit is used to model a fixed finite number of succeeding words. This is more efficient in training than bi-directional models and can be applied to lattice rescoring. The generated lattices can be used for downstream applications, such as confusion network decoding and keyword search. Experimental results on speech recognition and keyword spotting tasks illustrate the empirical usefulness of future word information, and the flexibility of the proposed model to represent this information.