Masking-based Neural Beamformer for Multichannel Speech Enhancement

Shuai Nie,Shan Liang,Zhanlei Yang,Longshuai Xiao,Wenju Liu,Jianhua Tao
DOI: https://doi.org/10.1109/ISCSLP57327.2022.10037878
2022-01-01
Abstract:Masking and beamforming techniques have shown considerable promise for multichannel speech enhancement. Masking technology can significantly reduce noise, but inevitably leads to speech distortion, especially in far-field reverb environments. While beamforming technology can effectively avoid speech distortion and perform very well in reverberant conditions. Obviously, masking-based beamforming scheme is a wise alternative. However, most methods use fixed or adaptive beamformers as spatial filters, which are either desinged in advance under certain sound field assumption, with limited noise reduction ability, or involve the complex matrix inverse operation of each frequency, with high computational complexity and instability. In this paper, we propose a fully learnable masking neural beamformer to jointly model masking and beamforming in a data-driven manner. Mask prediction and neural beamformer are jointly optimized by the spectrum and waveform approximation objectives. To improve the directional discrimination in reverb and diffuse noise environments, we further propose to use a pair of complementary fixed beamformers to exploit directional coherence feature (DCF) for mask prediction. Systematic experiments demonstrate the proposed approach is competitive with abailable methods in terms of speech enhancement and speech recognition.
What problem does this paper attempt to address?