Abstract:Fast adaptation of deep neural networks (DNN) is an important research topic in deep learning. In this paper, we have proposed a general adaptation scheme for DNN based on discriminant condition codes, which are directly fed to various layers of a pre-trained DNN through a new set of connection weights. Moreover, we present several training methods to learn connection weights from training data as well as the corresponding adaptation methods to learn new condition code from adaptation data for each new test condition. In this work, the fast adaptation scheme is applied to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion. We have proposed three different ways to apply this adaptation scheme based on the so-called speaker codes: i) Nonlinear feature normalization in feature space; ii) Direct model adaptation of DNN based on speaker codes; iii) Joint speaker adaptive training with speaker codes. We have evaluated the proposed adaptation methods in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that all three methods are quite effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed speaker-code-based adaptation methods may achieve up to 8-10% relative error reduction using only a few dozens of adaptation utterances per speaker. Finally, we have achieved very good performance in Switchboard (12.1% in WER) after speaker adaptation using sequence training criterion, which is very close to the best performance reported in this task (“Deep convolutional neural networks for LVCSR,” T. N. Sainath , Proc. IEEE Acoust., Speech, Signal Process., 2013).

Rapid Adaptation For Deep Neural Networks Through Multi-Task Learning

Linguistic Feedback Supports Rapid Adaptation to Acoustically Degraded Speech

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

Bayesian Learning for Deep Neural Network Adaptation

Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models

Extended Low-Rank Plus Diagonal Adaptation for Deep and Recurrent Neural Networks.

An Active Learning Approach to Task Adaptation.

Multi-Channel Feature Adaptation for Robust Speech Recognition

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network Based Language Model Adaptation

Improving Blstm Rnn Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features

Low-Rank Plus Diagonal Adaptation For Deep Neural Networks

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis.

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.