Fast Oov Words Incorporation Using Structured Word Embeddings for Neural Network Language Model.

Ruinian Chen,Kai Yu
DOI: https://doi.org/10.1109/icassp.2018.8461491
2018-01-01
Abstract:Recently, deep learning approaches have been widely used in language modeling and achieved great success. However, the out-of-vocabulary (OOV) words are often estimated in a rather crude way using only one special symbol, which ignores the linguistic information. In this paper we present an LSTM language model with structured word embeddings to tackle this problem. In our model, both input and output embeddings of LSTM language model are deployed with structured word embeddings. Utilizing syntactic-level and morphological-level parameters sharing, OOV words can be incorporated into the proposed model without retraining. The LSTM language model with structured word embeddings is instantiated for Chinese. Experiments show that the proposed model achieves PPL improvement on OOV words, and can be further integrated into automatic speech recognition systems for fast vocabulary updating.
What problem does this paper attempt to address?