Abstract:Recently a pre-trained context-dependent hybrid deep neural network (DNN) and HMM method has achieved significant performance gain in many large-scale automatic speech recognition (ASR) tasks. However, the error back-propagation (BP) algorithm for training neural networks is sequential in nature and is hard to parallelize into multiple computing threads. Therefore, training a deep neural network is extremely time-consuming even with a modern GPU board. In this paper we have proposed a new acoustic modelling framework to use multiple DNNs instead of a single DNN to compute the posterior probabilities of tied HMM states. In our method, all tied states of context-dependent HMMs are first grouped into several disjoined clusters based on the training data associated with these HMM states. Then, several hierarchically structured DNNs are trained separately for these disjoined clusters of data using multiple GPUs. In decoding, the final posterior probability of each tied HMM state can be calculated based on output posteriors from multiple DNNs. We have evaluated the proposed method on a 64-hour Mandarin transcription task and 309-hour Switchboard Hub5 task. Experimental results have shown that the new method using clusterbased multiple DNNs can achieve over 5 times reduction in total training time with only negligible performance degradation (about 1-2% in average) when using 3 or 4 GPUs respectively.

Continuous Speech Recognition for Large Vocabulary Based on Triphone DBN Model

Continuous Speech Recognition Based on the Triphone DDBHMM

Research on Context-Dependent Acoustical Unit (Triphone) for Mandarin Continuous Speech Recognition

Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling

Large Vocabulary Continuous Speech Recognition with Deep Recurrent Network

Deep LSTM for Large Vocabulary Continuous Speech Recognition

A Cluster-Based Multiple Deep Neural Networks Method for Large Vocabulary Continuous Speech Recognition

Deep Neural Network based Uyghur Large Vocabulary Continuous Speech Recognition

A two-layer lexical tree based beam search in continuous Chinese speech recognition

Continuous speech recognition method and continuous speech recognition system

Large Vocabulary Continuous Speech Recognition System Based on Hybrid Hidden Markov Model(HMM) and Artificial Neural Network (ANN)

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition

PHMM Based Asynchronous Acoustic Model for Chinese Large Vocabulary Continuous Speech Recognition

One-stage Search Algorithm for Large Vocabulary Continuous Speech Recognition Based on DDBHMM

Phonetic Classification-Based Triphone for Continuous Mandarin Speech Recognition

Integrated Method of Deep Learning and Large Language Model in Speech Recognition

A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition

Speech Recognition Based on Deep Neural Networks on Tibetan Corpus

Research on Inter-Syllable Context-Dependent Acoustic Unit for Mandarin Continuous Speech Recognition.

Context Dependent Initial/final Acoustic Modeling for Continuous Chinese Speech Recognition

Dbn Based Multi-Stream Models For Speech