Abstract:The hybrid deep neural network (DNN) and hidden Markov model (HMM) has recently achieved dramatic performance gains in automatic speech recognition (ASR). The DNN-based acoustic model is very powerful but its learning process is extremely time-consuming. In this paper, we propose a novel DNN-based acoustic modeling framework for speech recognition, where the posterior probabilities of HMM states are computed from multiple DNNs (mDNN), instead of a single large DNN, for the purpose of parallel training towards faster turnaround. In the proposed mDNN method all tied HMM states are first grouped into several disjoint clusters based on data-driven methods. Next, several hierarchically structured DNNs are trained separately in parallel for these clusters using multiple computing units (e.g. GPUs). In decoding, the posterior probabilities of HMM states can be calculated by combining outputs from multiple DNNs. In this work, we have shown that the training procedure of the mDNN under popular criteria, including both frame-level cross-entropy and sequence-level discriminative training, can be parallelized efficiently to yield significant speedup. The training speedup is mainly attributed to the fact that multiple DNNs are parallelized over multiple GPUs and each DNN is smaller in size and trained by only a subset of training data. We have evaluated the proposed mDNN method on a 64-hour Mandarin transcription task and the 320-hour Switchboard task. Compared to the conventional DNN, a 4-cluster mDNN model with similar size can yield comparable recognition performance in Switchboard (only about 2% performance degradation) with a greater than 7 times speed improvement in CE training and a 2.9 times improvement in sequence training, when 4 GPUs are used.

An Appropriate Parallel HMM for Speaker-Independent Speech Recognition

Speaker-independent speech recognition based on HMM state-restructuring method

Hybrid speech recognition based on improved hidden markov model and neural network

A fused hidden Markov model with application to bimodal speech processing

From Linear Prediction HMM to a New Combined Model for Speech Recognition

A New Hybrid Hmm/Ann Model For Speech Recognition

Adaptive Speaker Recognition Based on Hidden Markov Model Parameter Optimization

Speaker‐independent Phoneme Recognition Using Hidden Markov Models

Speech Recognition Algorithm Based on Neural Network and Hidden Markov Model

A Parameter Transfer Method for HMM-DNN Heterogeneous Model with the Scarce Mongolian Data Set

A Hybrid Speech Recognition System Based on HMM/ANN

Parametric model of introducing inter-frame correlation information into hidden markov model for speech recognition

State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition

An Unsupervised Speaker Adaptation Method Based on VQ-HMM

Application of Hidden Markov Models in Speech Command Recognition

The Cohort-Selection And Normalized Hidden Markov Model For Speaker Recognition

A Comparative Study of Discrete, Semicontinuous, and Continuous Hidden Markov Models.

Large Vocabulary Continuous Speech Recognition System Based on Hybrid Hidden Markov Model(HMM) and Artificial Neural Network (ANN)

An inhomogeneous HMM speech recognition algorithm

Key Technology Research for Speech Recognition

The Hidden Markov Model of co-articulation and its application to the continuous speech recognition