A Beam-TFDPRNN Based Speech Separation Method in Reverberant Environments

Xu Zhang,Changchun Bao,Jing Zhou,Xue Yang
DOI: https://doi.org/10.1109/icspcc59353.2023.10400232
2023-01-01
Abstract:Recently, the beamforming methods based on the time domain audio separation network (Beam-TasNet) have shown satisfactory performance. For example, the performance of minimum variance distortionless response (MVDR) beamformer can be effectively improved by using the time domain audio separation network (TasNet). However, the reverberation will draw a significant performance degradation to the Beam-TasNet since the multiple reflection sounds damage the time domain features extracted by the TasNet. Fortunately, the recent studies show that the frequency domain features have better anti-interference capability in the reverberation environment. Therefore, this paper proposed a MVDR beamforming method based on the time-frequency domain Dual-Path Recurrent Neural Network (TFDPRNN) for the task of speech separation, and we call it Beam-TFDPRNN. In this method, the TFDPRNN uses a path scanning mechanism to capture the time-frequency features more comprehensively by repeatedly scanning input speech signal in both the time and frequency dimensions. The scanned time-frequency features could describe the characteristics of the speech sources in reverberation environment more robustly. As a result, a better pre-separation result is obtained by the TFDPRNN. Furthermore, by using the pre-separation signals to calculate the spatial covariance matrices, a more robust MVDR beamformer is obtained so that the speech separation in the reverberation environment is achieved efficiently. The experiment results based on the WSJ0-2mix corpus show that the proposed method achieves superior performance compared to the reference methods.
What problem does this paper attempt to address?