Abstract:The hybrid deep neural network (DNN) and hidden Markov model (HMM) has recently achieved dramatic performance gains in automatic speech recognition (ASR). The DNN-based acoustic model is very powerful but its learning process is extremely time-consuming. In this paper, we propose a novel DNN-based acoustic modeling framework for speech recognition, where the posterior probabilities of HMM states are computed from multiple DNNs (mDNN), instead of a single large DNN, for the purpose of parallel training towards faster turnaround. In the proposed mDNN method all tied HMM states are first grouped into several disjoint clusters based on data-driven methods. Next, several hierarchically structured DNNs are trained separately in parallel for these clusters using multiple computing units (e.g. GPUs). In decoding, the posterior probabilities of HMM states can be calculated by combining outputs from multiple DNNs. In this work, we have shown that the training procedure of the mDNN under popular criteria, including both frame-level cross-entropy and sequence-level discriminative training, can be parallelized efficiently to yield significant speedup. The training speedup is mainly attributed to the fact that multiple DNNs are parallelized over multiple GPUs and each DNN is smaller in size and trained by only a subset of training data. We have evaluated the proposed mDNN method on a 64-hour Mandarin transcription task and the 320-hour Switchboard task. Compared to the conventional DNN, a 4-cluster mDNN model with similar size can yield comparable recognition performance in Switchboard (only about 2% performance degradation) with a greater than 7 times speed improvement in CE training and a 2.9 times improvement in sequence training, when 4 GPUs are used.

Automatic Model Redundancy Reduction for Fast Back-Propagation for Deep Neural Networks in Speech Recognition

Deep Neural Network Acceleration with Sparse Prediction Layers

Structured Probabilistic Pruning for Convolutional Neural Network Acceleration.

Reshaping deep neural network for fast decoding by node-pruning

Research on Acceleration Method of Speech Recognition Training.

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Fast Cnn Pruning Via Redundancy-Aware Training

Alternating update layers for DBN-DNN fast training method

GhostRNN: Reducing State Redundancy in RNN with Cheap Operations

Streamlining Speech Enhancement DNNs: an Automated Pruning Method Based on Dependency Graph with Advanced Regularized Loss Strategies

Stochastic data sweeping for fast DNN training

A Cluster-Based Multiple Deep Neural Networks Method for Large Vocabulary Continuous Speech Recognition

Reducing Data Motion to Accelerate the Training of Deep Neural Networks

Fast Training Algorithm for Deep Neural Network Using Multiple GPUs

Inference skipping for more efficient real-time speech enhancement with parallel RNNs

Accelerate CNN Via Recursive Bayesian Pruning.

Research on Speech Recognition Acceleration Algorithm Using GPU and Deep Belief Network

Identifying and Pruning Redundant Structures for Deep Neural Networks

State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

DBP: Discrimination Based Block-Level Pruning for Deep Model Acceleration.