Abstract:It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.

Speaker adaptation based on combination of MAP estimation and weighted neighbor regression

Speaker Adaptation with MAP Estimation and Weighted Neighbor Regression

A Novel speaker Adaptation Method based on Map and NNLR

MAP-based Speaker Adaptation in Speech Synthesis

Speaker Normalization Training and Adaptation for Speech Recognition

Agmma: A Novel Incremental Adaptation Method And Its Application To Speaker Recognition

Eigenvoice-based MAP Adaptation Within Correlation Subspace

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Speaker Adaptation for Telephony Data Using Speaker Clustering

A New Subspace Based Speaker Adaptation Method

Speaker adaptation using maximum likelihood model interpolation

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

Eigenvoice-based MAP Fast Adaptation in Correlation Subspaces

Codebook-Based Speaker Adaptation

Adapting noisy speech models — Extended uncertainty decoding

Online Speaker Adaptation for WaveNet-based Neural Vocoders

Phoneme Dependent Speaker Embedding And Model Factorization For Multi-Speaker Speech Synthesis And Adaptation

Speech Recognition Using Speaker Adaptation by System Parameter Transformation.

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis

Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

Linguistic tree based maximum likelihood model interpolation