Two-stage unet with channel and temporal-frequency attention for multi-channel speech enhancement

Shiyun Xu,Yinghan Cao,Zehua Zhang,Mingjiang Wang
DOI: https://doi.org/10.1016/j.specom.2024.103154
IF: 2.723
2024-12-01
Speech Communication
Abstract:In multi-channel speech enhancement, spectral masking and beamforming are two standard techniques. We propose a two-stage model combining the advantages of both approaches to enhance network performance. We propose the temporal-frequency self-attention block to effectively extract speech features in the temporal-frequency dimension with low complexity. We propose the residual efficient channel attention block, a lightweight module that learns channel attention through inter-channel interaction. Besides, We propose the real and imaginary beamforming as a replacement for traditional beamforming, estimating filter weights from the real and imaginary parts separately and fully utilizing the spatial information of the data. According to the experimental results, the performance of this model outperforms other models on both the L3DAS22 dataset and the spatial DNS dataset. Furthermore, our model demonstrates superior denoising and dereverberation under various noise and reverberation conditions.
computer science, interdisciplinary applications,acoustics
What problem does this paper attempt to address?