Robust Speech Recognition With Speech Enhanced Deep Neural Networks

Jun Du,Qing Wang,Tian Gao,Yong Xu,Lirong Dai,Chin-Hui Lee
DOI: https://doi.org/10.21437/interspeech.2014-148
2014-01-01
Abstract:We propose a signal pre-processing front-end to enhance speech based on deep neural networks (DNNs) and use the enhanced speech features directly to train hidden Markov models (HMMs) for robust speech recognition. As a comprehensive study, we examine its effectiveness for different acoustic features, acoustic models, and training-testing combinations. Tested on the Aurora4 task the experimental results indicate that our proposed framework consistently outperform the stateof-the-art speech recognition systems in all evaluation conditions. To our best knowledge, this is the first showcase on the Aurora4 task yielding performance gains by using only an enhancement pre-processor without any adaptation or compensation post-processing on top of the best DNN-HMM system. The word error rate reduction from the baseline system is up to 50% for clean-condition training and 15% for multi-condition training. We believe the system performance could be improved further by incorporating post-processing techniques to work coherently with the proposed enhancement pre-processing scheme.
What problem does this paper attempt to address?