Abstract:In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC features, in two different neural network architectures: DNN and time-delay neural network (TDNN). We analyze and compare different types of adaptation techniques such as i-vectors and feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) with the proposed adaptation approach, and explore their complementarity using various types of fusion such as feature level, posterior level, lattice level and others in order to discover the best possible way of combination. Experimental results on the TED-LIUM corpus show that the proposed adaptation technique can be effectively integrated into DNN and TDNN setups at different levels and provide additional gain in recognition performance: up to 6% of relative word error rate reduction (WERR) over the strong feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) speaker adapted DNN baseline, and up to 18% of relative WERR in comparison with a speaker independent (SI) DNN baseline model, trained on conventional features. For TDNN models the proposed approach achieves up to 26% of relative WERR in comparison with a SI baseline, and up 13% in comparison with the model adapted by using i-vectors. The analysis of the adapted GMMD features from various points of view demonstrates their effectiveness at different levels.

A Decision Tree-Structured Algorithm Of Speaker Adaptation Based On Gaussian Similarity Analysis

of Speaker Adaptation Based on Gaussian

Interpolation adaptation algorithm based on gaussian similarity analysis

Agmma: A Novel Incremental Adaptation Method And Its Application To Speaker Recognition

Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems.

Speaker Adaptation for Telephony Data Using Speaker Clustering

A Speaker Adaptation Algorithm Based on Matrix Linear Interpolation

Linguistic tree based maximum likelihood model interpolation

Speaker adaptation using maximum likelihood model interpolation

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis

Codebook-Based Speaker Adaptation

Unsupervised Speaker Adaptation Of Deep Neural Network Based On The Combination Of Speaker Codes And Singular Value Decomposition For Speech Recognition

Dynamic Speaker Selected Training for Rapid Speaker Adaptation

A New Subspace Based Speaker Adaptation Method

Speaker adaptation based on combination of MAP estimation and weighted neighbor regression

Online Speaker Adaptation Using Memory-Aware Networks for Speech Recognition

Eigenvoice-based MAP Adaptation Within Correlation Subspace

Generalized domain adaptation framework for parametric back-end in speaker recognition

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

Discriminative Speaker Adaptation with Eigenvoices