Abstract:Recent work demonstrates impressive success of the bottleneck (BN) feature in speech recognition, particularly with deep networks plus appropriate pre-training. A widely admitted advantage associated with the BN feature is that the network structure can learn multiple environmental conditions with abundant training data. For tasks with limited training data, however, this multi-condition training is unavailable, and so the networks tend to be over-fitted and sensitive to acoustic condition changes. A possible solution is to base the BN features on a channel-robust primary feature.In this paper, we propose to derive the BN feature based on Gammatone frequency cepstral coefficients (GFCCs). The GFCC feature has shown nice robustness against acoustic change, due to its capability of simulating the auditory system of humans. The idea is to integrate the advantage of the GFCC feature in acoustic robustness and the advantage of the BN feature in signal representation, so that the BN feature can be improved in the condition of mismatched training/test channels. This is particularly useful for small-scale tasks for which the training data are often limited. The experiments are conducted on the WSJCAMO database, where the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results confirm that the GFCC-based BN feature is much more robust than the BN features based on the MFCC and the PLP. Furthermore, the primary GFCC feature and the GFCC-based BN feature can be concatenated, leading to a more robust combined feature which provides considerable performance gains in all the tested noise conditions.

Improving Blstm Rnn Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features

Improving Accented Mandarin Speech Recognition by Using Recurrent Neural Network Based Language Model Adaptation

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.

Improved BLSTM RNN Based Accent Speech Recognition Using Multi-task Learning and Accent Embeddings

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition

Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition

Bottleneck Features Based On Gammatone Frequency Cepstral Coefficients

Improving Bottleneck Features for Automatic Speech Recognition Using Gammatone-Based Cochleagram and Sparsity Regularization

Accent Recognition with Hybrid Phonetic Features

Improvements on bottleneck feature for large vocabulary continuous speech recognition

Attentive batch normalization for lstm-based acoustic modeling of speech recognition

Effective Acoustic Modeling for Pronunciation Quality Scoring of Strongly Accented Mandarin Speech

Modeling Speaker Variability Using Long Short-Term Memory Networks For Speech Recognition

Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

Rapid Adaptation For Deep Neural Networks Through Multi-Task Learning

Leveraging native language information for improved accented speech recognition

Cantonese Automatic Speech Recognition Using Transfer Learning from Mandarin

Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter

Improved Bottleneck Feature Using Hierarchical Deep Belief Networks for Keyword Spotting in Continues Speech

Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework

Cross-language transfer learning for deep neural network based speech enhancement