Language Modeling with Highway LSTM

Gakuto Kurata,Bhuvana Ramabhadran,George Saon,Abhinav Sethy

DOI: https://doi.org/10.48550/arXiv.1709.06436

2017-09-19

Abstract:Language models (LMs) based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks. In this paper, we extend an LSTM by adding highway networks inside an LSTM and use the resulting Highway LSTM (HW-LSTM) model for language modeling. The added highway networks increase the depth in the time dimension. Since a typical LSTM has two internal states, a memory cell and a hidden state, we compare various types of HW-LSTM by adding highway networks onto the memory cell and/or the hidden state. Experimental results on English broadcast news and conversational telephone speech recognition show that the proposed HW-LSTM LM improves speech recognition accuracy on top of a strong LSTM LM baseline. We report 5.1% and 9.9% on the Switchboard and CallHome subsets of the Hub5 2000 evaluation, which reaches the best performance numbers reported on these tasks to date.

Computation and Language

What problem does this paper attempt to address?

This paper attempts to enhance the performance of Long - Short - Term Memory networks (LSTM) in language modeling by introducing Highway Networks, thereby improving the accuracy of Automatic Speech Recognition (ASR) tasks. Specifically, the authors propose three different Highway - LSTM (HW - LSTM) variants: HW - LSTM - C, HW - LSTM - H and HW - LSTM - CH. These variants add Highway Networks to the memory cell or hidden state of LSTM respectively, or add them to both parts simultaneously. The main contributions of the paper include: 1. Proposing a new language modeling technique, namely using HW - LSTM. 2. Designing a method for training HW - LSTM language models. This method first uses a regular LSTM for pre - training, and then converts it into HW - LSTM by adding highway connections and continues training. 3. Demonstrating the application effects of the above - mentioned method in broadcast news and conversational telephone speech recognition tasks based on public data sets, especially achieving the best reported accuracy on the Switchboard and CallHome subsets currently. The experimental results show that the HW - LSTM - H variant performs best in reducing the Word Error Rate (WER), especially when using deep Highway Networks, which can further improve the performance of the model. In addition, the study also found that regular LSTM and HW - LSTM can be used complementarily, and combining the two can further reduce the WER.

Language Modeling with Highway LSTM

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

Learning an Efficient and Safe Policy for Highway Driving Using Supervised Learning and Reinforcement Learning.

Highway II, an Extended Version of Highway Networks and Its Application to Densely Connected Bi-LSTM

Neural Machine Translation with Recurrent Highway Networks

Improving the Performance of the LSTM and HMM Model via Hybridization

Deep Neural Networks Language Model Based on CNN and LSTM Hybrid Architecture

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Multi-View Lstm Language Model With Word-Synchronized Auxiliary Feature For Lvcsr

End-to-end attention-based distant speech recognition with Highway LSTM

Drive As Veteran: Fine-tuning of an Onboard Large Language Model for Highway Autonomous Driving

Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition

Large-scale Language Model Rescoring on Long-form Data

Modeling Speaker Variability Using Long Short-Term Memory Networks For Speech Recognition

Speaker-aware Training of LSTM-RNNS for Acoustic Modelling

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

ELSTM: An improved long short‐term memory network language model for sequence learning

Improving the Robustness to Data Inconsistency between Training and Testing for Code Completion by Hierarchical Language Model

Development and evaluation of bidirectional LSTM freeway traffic forecasting models using simulation data