Abstract:Recent work demonstrates impressive success of the bottleneck (BN) feature in speech recognition, particularly with deep networks plus appropriate pre-training. A widely admitted advantage associated with the BN feature is that the network structure can learn multiple environmental conditions with abundant training data. For tasks with limited training data, however, this multi-condition training is unavailable, and so the networks tend to be over-fitted and sensitive to acoustic condition changes. A possible solution is to base the BN features on a channel-robust primary feature.In this paper, we propose to derive the BN feature based on Gammatone frequency cepstral coefficients (GFCCs). The GFCC feature has shown nice robustness against acoustic change, due to its capability of simulating the auditory system of humans. The idea is to integrate the advantage of the GFCC feature in acoustic robustness and the advantage of the BN feature in signal representation, so that the BN feature can be improved in the condition of mismatched training/test channels. This is particularly useful for small-scale tasks for which the training data are often limited. The experiments are conducted on the WSJCAMO database, where the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results confirm that the GFCC-based BN feature is much more robust than the BN features based on the MFCC and the PLP. Furthermore, the primary GFCC feature and the GFCC-based BN feature can be concatenated, leading to a more robust combined feature which provides considerable performance gains in all the tested noise conditions.

Study on Continuous Speech Recognition based on Bottleneck Features for Lhasa-Tibetan Dialect

Study on Continuous Speech Recognition Based on Bottleneck Features for Lhasa-Tibetan Dialect

Speech Bottleneck Feature Extraction Method Based on Overlapping Group Lasso Sparse Deep Neural Network

Mongolian acoustic modeling based on deep neural network

An investigation on DNN-derived bottleneck features for GMM-HMM based robust speech recognition

Improvement Of Distant-Talking Speaker Identification Using Bottleneck Features Of Dnn

Investigation on Acoustic Modeling with Different Phoneme Set for Continuous Lhasa Tibetan Recognition Based on Dnn Method

Investigation on dimensionality reduction of concatenated features with deep neural network for LVCSR systems

Improvements on bottleneck feature for large vocabulary continuous speech recognition

Improving Blstm Rnn Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features

Bottleneck Features Based On Gammatone Frequency Cepstral Coefficients

Speech Recognition Based on Deep Neural Networks on Tibetan Corpus

Language Identification with Deep Bottleneck Features

Mongolian Speech Recognition Based on Deep Neural Networks

Deep Neural Network Derived Bottleneck Features For Accurate Audio Classification

Deep Feature Learning For Tibetan Speech Recognition Using Sparse Auto-Encoder

Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition

Improving Bottleneck Features for Automatic Speech Recognition Using Gammatone-Based Cochleagram and Sparsity Regularization

Speaker Recognition System Based on Deep Neural Networks and Bottleneck Features

Tibetan Language Continuous Speech Recognition Based on Dynamic Bayesian Network

Improved Language Identification Using Deep Bottleneck Network