Two-Stage UNet with Multi-Axis Gated Multilayer Perceptron for Monaural Noisy-Reverberant Speech Enhancement

Zehua Zhang,Shiyun Xu,Xuyi Zhuang,Lianyu Zhou,Heng Li,Mingjiang Wang
DOI: https://doi.org/10.1109/icassp49357.2023.10095657
2023-01-01
Abstract:In denoising and de-reverberation tasks, the dominant methods are complex spectral masking and complex spectral mapping. To combine advantages and improve speech enhancement performance, we propose a two-stage UNet (TSUNet) to estimate complex spectral masking and complex spectral mapping. We use a multi-axis gated multilayer perceptron to build global and local attention modules of linear complexity for extracting speech features. Furthermore, we use the residual channel attention block to further filter out important speech features. On the blind test dataset of the Deep Noise Suppression Challenge, our proposed TSUNet has a massive advantage over other state-of-the-art models. TSUNet performs significantly better than the most recent models at noisy-reverberant speech enhancement.
What problem does this paper attempt to address?