Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments

Tian Gao,Jun Du,Yong Xu,Cong Liu,Li-Rong Dai,Chin-Hui Lee
DOI: https://doi.org/10.1007/978-3-319-22482-4_9
2015-01-01
Abstract:We propose a joint framework combining speech enhancement SE and voice activity detection VAD to increase the speech intelligibility in low signal-noise-ratio SNR environments. Deep Neural Networks DNN have recently been successfully adopted as a regression model in SE. Nonetheless, the performance in harsh environments is not always satisfactory because the noise energy is often dominating in certain speech segments causing speech distortion. Based on the analysis of SNR information at the frame level in the training set, our approach consists of two steps, namely: 1 a DNN-based VAD model is trained to generate frame-level speech/non-speech probabilities; and 2 the final enhanced speech features are obtained by a weighted sum of the estimated clean speech features processed by incorporating VAD information. Experimental results demonstrate that the proposed SE approach effectively improves short-time objective intelligibility STOI by 0.161 and perceptual evaluation of speech quality PESQ by 0.333 over the already-good SE baseline systems at $$-$$-5dB SNR of babble noise.
What problem does this paper attempt to address?