Speech separation based on reliable binaural cues with two-stage neural network in noisy-reverberant environments
Ruwei Li,Tao Li,Xiaoyue Sun,Xingwu Sun,Fengnian Zhao
DOI: https://doi.org/10.1016/j.apacoust.2020.107445
IF: 3.614
2020-11-01
Applied Acoustics
Abstract:<p>Background noise and room reverberation often cause a decrease in reliability of binaural cues and speech quality, especially in non-stationary environment. In order to solve these problems, we propose a novel speech separation algorithm based on two-stage neural network model and a special separation mask in noisy-reverberant environment. In this algorithm, firstly, the weight matrix is derived to construct reliable binaural cues through the first-stage neural network. The reliable binaural cues combined with complementary spectral features is used as input of separation DNN. Secondly, a special separation mask is introduced for noisy-reverberant environment, which can suppress background noise and reduce reverberation. Thirdly, the separation DNN is used as nonlinear function to estimate separation mask. Then, the two-stage neural network system is trained jointly. During the joint training process, the system adaptively adjusts the weight matrix according to the final error, which is similar to the attention mechanism introduced for binaural features. At the same time, due to the increased reliability of binaural cues, neural networks can make better use of effective information. Finally, the estimated separation mask is used to weight the noisy-reverberant speech to achieve the enhanced speech. Experimental results indicate that the proposed algorithm achieves better performance than the contrast algorithms in different scenarios with various amounts of noise and reverberation.</p>
acoustics