Abstract:Discriminative training techniques define state-of-the-art performance for automatic speech recognition systems. However, they are inherently prone to overfitting, leading to poor generalization performance when using limited training data. In order to address this issue, this paper presents a full Bayesian framework to account for model uncertainty in sequence discriminative training of factored TDNN acoustic models. Several Bayesian learning based TDNN variant systems are proposed to model the uncertainty over weight parameters and choices of hidden activation functions, or the hidden layer outputs. Efficient variational inference approaches using a few as one single parameter sample ensure their computational cost in both training and evaluation time comparable to that of the baseline TDNN systems. Statistically significant word error rate (WER) reductions of 0.4%-1.8% absolute (5%-11% relative) were obtained over a state-of-the-art 900 hour speed perturbed Switchboard corpus trained baseline LF-MMI factored TDNN system using multiple regularization methods including F-smoothing, L2 norm penalty, natural gradient, model averaging and dropout, in addition to i-Vector plus learning hidden unit contribution (LHUC) based speaker adaptation and RNNLM rescoring. Consistent performance improvements were also obtained on a 450 hour HKUST conversational Mandarin telephone speech recognition task. On a third cross domain adaptation task requiring rapidly porting a 1000 hour LibriSpeech data trained system to a small DementiaBank elderly speech corpus, the proposed Bayesian TDNN LF-MMI systems outperformed the baseline system using direct weight fine-tuning by up to 2.5\% absolute WER reduction.

Lhasa Dialect Recognition of Different Phonemes Based on TDNN Method.

Phonotactic language recognition based on DNN-HMM acoustic model

Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning

New Neural Network Architecture with Application in Mandarin Digit Speech Recognition

DLD: An Optimized Chinese Speech Recognition Model Based on Deep Learning

Fast learning algorithms for time-delay neural networks phoneme recognition

Improving Minority Language Speech Recognition Based on Distinctive Features

Research on acoustic Model of Putian Dialect Speech Recognition Based on Deep Learning

Phonetic Temporal Neural Model for Language Identification

Dynamic Multi-scale Convolution for Dialect Identification

Acoustic Modeling With Dfsmn-Ctc And Joint Ctc-Ce Learning

A comparative study on selecting acoustic modeling units in deep neural networks based large vocabulary Chinese speech recognition

Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks

Speech recognition method based on DNN-LSTM combined with Wiener filtering algorithm

Acceleration Strategies for Speech Recognition Based on Deep Neural Networks

Modeling Speaker Variability Using Long Short-Term Memory Networks For Speech Recognition

Deep neural networks for syllable based acoustic modeling in Chinese speech recognition.

Improve Mispronunciation Detection with Tandem Feature

Applying Multitask Learning To Acoustic-Phonemic Model For Mispronunciation Detection And Diagnosis In L2 English Speech

Phone-aware Neural Language Identification

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition