Abstract:Recently, deep recurrent neural networks (DRNNs) have been widely proposed for language modeling. DRNNs can learn higher-level features of input data by stacking multiple recurrent layers, making them achieve better performance than single-layer models. However, due to their simple linear stacking patterns, the gradient information vanishes when it is backward propagated through too many layers. As a result, DRNNs become hard to train and their performance degrades rapidly with the number of recurrent layers increasing. To address this problem, the feature memory-based deep recurrent neural network (FMDRNN) is proposed in this paper. FMDRNN presents a new stacking pattern by introducing a special feature memory module (FM), which makes the hidden units of each layer can see and reuse all the features generated by preceding stacked layers, not just the feature from previous layer as in DRNNs. FM is like a traffic hub to provide direct connections between each two layers, and the attention network in FM controls the switch of these connections. These direct connections enable FMDRNN can alleviate the vanishing of gradient in the process of backward propagation and also make the learned features do not wash away when they reach the end of the network. FMDRNN is evaluated by performing extensive experiments on the widely used English Penn Treebank dataset and five more complex non-English language corpora. The experimental results show that FMDRNN can be effectively trained even if a larger number of layers are stacked, so that it benefits from deeper networks instead of degrading performance, and consistently achieves markedly better results than other models through deeper but thinner network. (C) 2018 Elsevier B.V. All rights reserved.

Recurrent Neural Network Language Model With Structured Word Embeddings For Speech Recognition

Recurrent Neural Network Language Model with Part-of-speech for Mandarin Speech Recognition.

Fast Oov Words Incorporation Using Structured Word Embeddings for Neural Network Language Model.

Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.

Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network Based Language Model Adaptation

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

Recurrent Neural Networks with Pre-trained Language Model Embedding for Slot Filling Task

A Mongolian Language Model Based on Recurrent Neural Networks

Global context-dependent recurrent neural network language model with sparse feature learning

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

Future word contexts in neural network language models

Hierarchical Lexicon Embedding Architecture for Chinese Named Entity Recognition

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition

Word Embedding For Recurrent Neural Network Based Tts Synthesis

A Residual BiLSTM Model for Named Entity Recognition

Build Chinese Language Model with Recurrent Neural Network.

Feature Memory-Based Deep Recurrent Neural Network for Language Modeling

Recurrent Neural Network Language Model Adaptation Derived Document Vector

Neural Network Language Modeling With Letter-Based Features And Importance Sampling

Investigation of Senone-based Long-Short Term Memory RNNs for Spoken Language Recognition

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model