Abstract:In our previous work, we introduced our attention-based speaker adaptation method, which has been proved to be an efficient online speaker adaptation method for real-time speech recognition. In this paper, we present a more complete framework of this method named memory-aware networks, which consists of the main network, the memory module, the attention module and the connection module. A gate mechanism and a multiple-connections strategy are presented to connect the memory with the main network in order to take full advantage of the memory. An auxiliary speaker classification task is provided to improve the accuracy of the attention module. The fixed-size ordinally forgetting encoding method is used together with average pooling to gather both short-term and long-term information. Furthermore, instead of only using traditional speaker embeddings such as i-vectors or d-vectors as the memory, we design a new form of memory called residual vectors, which can represent different pronunciation habits. Experiments on both the Switchboard and AISHELL-2 tasks show that our method can perform online speaker adaptation very well with no additional adaptation data and with only a relative 3% increase in decoding computation complexity. Under the cross-entropy criterion, our method achieves a relative word error rate reduction of 9.4% and 8.3% compared to that of the speaker-independent model on the Switchboard task and the AISHELL-2 task, respectively, and approximately 7.0% compared to that of the traditional d-vector-based speaker adaptation method.

Just-in-time Latent Semantic Adaptation on Language Model for Chinese Speech Recognition Using Web Data

Linguistic Feedback Supports Rapid Adaptation to Acoustically Degraded Speech

A new latent semantic analysis language model

An Online Incremental Language Model Adaptation Method

A Public Chinese Dataset for Language Model Adaptation

Language model adaptation based on correction information for interactive speech transcription

Online Speaker Adaptation for LVCSR Based on Attention Mechanism

CLMAD: A Chinese Language Model Adaptation Dataset

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Web-based keyword adapted Language Modeling for Keyword Spotting

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

Reducing time-synchronous beam search effort using stage based look-ahead and language model rank based pruning

Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

An Online Attention-based Model for Speech Recognition

Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network Based Language Model Adaptation

Investigating Online Low-Footprint Speaker Adaptation Using Generalized Linear Regression and Click-Through Data.

An Active Learning Approach to Task Adaptation.

Model Adaptation Using the Projection to Latent Structure Algorithm

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition

Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation

Toward On-Line Learning of Chinese Continuous Speech Recognition System.