Dynamic noise aware training for speech enhancement based on deep neural networks.

Yong Xu,Jun Du,Li-Rong Dai,Chin-Hui Lee
DOI: https://doi.org/10.21437/interspeech.2014-571
2014-01-01
Abstract:We propose three algorithms to address the mismatch problem in deep neural network (DNN) based speech enhancement. First, we investigate noise aware training by incorporating noise informationin the testutterance with anideal binary maskbased dynamic noise estimation approach to improve DNN’s speech separation ability from the noisy signal. Next, a set of more than 100 noise types is adopted to enrich the generalization capabilities of the DNN to unseen and non-stationary noise conditions. Finally, the quality of the enhanced speech can further be improved by global variance equalization. Empirical results show that each of the three proposed techniques contributes to the performance improvement. Compared to the conventional logarithmic minimum mean squared error speech enhancement method, our DNN system achieves 0.32 PESQ (perceptual evaluation of speech quality) improvement across six signal-tonoise ratio levels ranging from -5dB to 20dB on a test set with unknown noise types. We also observe that the combined strategies can well suppress highly non-stationary noise better than all the competing state-of-the-art techniques we have evaluated. Index Terms: Speech enhancement, deep neural networks, noise aware training, ideal binary mask, non-stationary noise
What problem does this paper attempt to address?