Abstract:In this paper we investigate the GMM-derived (GMMD) features for adaptation of deep neural network (DNN) acoustic models. The adaptation of the DNN trained on GMMD features is done through the maximum a posteriori (MAP) adaptation of the auxiliary GMM model used for GMMD feature extraction. We explore fusion of the adapted GMMD features with conventional features, such as bottleneck and MFCC features, in two different neural network architectures: DNN and time-delay neural network (TDNN). We analyze and compare different types of adaptation techniques such as i-vectors and feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) with the proposed adaptation approach, and explore their complementarity using various types of fusion such as feature level, posterior level, lattice level and others in order to discover the best possible way of combination. Experimental results on the TED-LIUM corpus show that the proposed adaptation technique can be effectively integrated into DNN and TDNN setups at different levels and provide additional gain in recognition performance: up to 6% of relative word error rate reduction (WERR) over the strong feature-space adaptation techniques based on maximum likelihood linear regression (fMLLR) speaker adapted DNN baseline, and up to 18% of relative WERR in comparison with a speaker independent (SI) DNN baseline model, trained on conventional features. For TDNN models the proposed approach achieves up to 26% of relative WERR in comparison with a SI baseline, and up 13% in comparison with the model adapted by using i-vectors. The analysis of the adapted GMMD features from various points of view demonstrates their effectiveness at different levels.

GMM-HMM Acoustic Model Training by a Two Level Procedure with Gaussian Components Determined by Automatic Model Selection

Discriminative GMM-HMM acoustic model selection using two-level bayesian ying-yang harmony learning

Discriminative training of GMM-HMM acoustic model by RPCL type Bayesian Ying-Yang harmony learning

Learning Virtual HD Model for Bi-model Emotional Speaker Recognition

Agmma: A Novel Incremental Adaptation Method And Its Application To Speaker Recognition

Boosted Mixture Learning of Gaussian Mixture Hidden Markov Models Based on Maximum Likelihood for Speech Recognition

Discriminative Dynamic Gaussian Mixture Selection with Enhanced Robustness and Performance for Multi-Accent Speech Recognition

Improvement of hidden Markov model (HMM) for speech recognition

An iterative algorithm for BYY learning on Gaussian mixture with automated model selection

An Annealing Approach to Byy Harmony Learning on Gaussian Mixture with Automated Model Selection

Product HMM-based Training Method for Acoustic Model with Multiple-Size Units

Discriminative training of GMM-HMM acoustic model by RPCL learning

Exploiting Glottal Information in Speaker Recognition Using Parallel GMMs

The BYY annealing learning algorithm for Gaussian mixture with automated model selection

Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery

Exploring Gaussian mixture model framework for speaker adaptation of deep neural network acoustic models

A BYY Scale-Incremental EM Algorithm for Gaussian Mixture Learning

Emotion recognition from speech via boosted Gaussian mixture models

Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition

Speaker identification by BYY automatic local factor analysis based three-level voting combination

Reliable Accent-Specific Unit Generation With Discriminative Dynamic Gaussian Mixture Selection for Multi-Accent Chinese Speech Recognition