Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR

Felix Weninger,Hakan Erdogan,Shinji Watanabe,Emmanuel Vincent,Jonathan Le Roux,John R. Hershey,Björn Schuller
DOI: https://doi.org/10.1007/978-3-319-22482-4_11
2015-01-01
Abstract:We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘naïvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.
What problem does this paper attempt to address?