Abstract:Fast adaptation of deep neural networks (DNN) is an important research topic in deep learning. In this paper, we have proposed a general adaptation scheme for DNN based on discriminant condition codes, which are directly fed to various layers of a pre-trained DNN through a new set of connection weights. Moreover, we present several training methods to learn connection weights from training data as well as the corresponding adaptation methods to learn new condition code from adaptation data for each new test condition. In this work, the fast adaptation scheme is applied to supervised speaker adaptation in speech recognition based on either frame-level cross-entropy or sequence-level maximum mutual information training criterion. We have proposed three different ways to apply this adaptation scheme based on the so-called speaker codes: i) Nonlinear feature normalization in feature space; ii) Direct model adaptation of DNN based on speaker codes; iii) Joint speaker adaptive training with speaker codes. We have evaluated the proposed adaptation methods in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that all three methods are quite effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed speaker-code-based adaptation methods may achieve up to 8-10% relative error reduction using only a few dozens of adaptation utterances per speaker. Finally, we have achieved very good performance in Switchboard (12.1% in WER) after speaker adaptation using sequence training criterion, which is very close to the best performance reported in this task (“Deep convolutional neural networks for LVCSR,” T. N. Sainath , Proc. IEEE Acoust., Speech, Signal Process., 2013).

Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

Online Speaker Adaptation for LVCSR Based on Attention Mechanism

Agmma: A Novel Incremental Adaptation Method And Its Application To Speaker Recognition

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition

Online Speaker Adaptation for WaveNet-based Neural Vocoders

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Phoneme Dependent Speaker Embedding And Model Factorization For Multi-Speaker Speech Synthesis And Adaptation

Unsupervised Speaker Adaptation Of Deep Neural Network Based On The Combination Of Speaker Codes And Singular Value Decomposition For Speech Recognition

Bayesian Learning for Deep Neural Network Adaptation

Personality-memory Gated Adaptation: an Efficient Speaker Adaptation for Personalized End-to-end Automatic Speech Recognition

Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis

An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

An New Approach for Incremental Speaker Adaptation

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.

Investigating Online Low-Footprint Speaker Adaptation Using Generalized Linear Regression and Click-Through Data.