NAAGN: Noise-Aware Attention-Gated Network for Speech Enhancement

Feng Deng,Tao Jiang,Xiao-Rui Wang,Chen Zhang,Yan Li
DOI: https://doi.org/10.21437/interspeech.2020-1133
2020-01-01
Abstract:For single channel speech enhancement, contextual information is very important for accurate speech estimation. In this paper, to capture long-term temporal contexts, we treat speech enhancement as a sequence-to-sequence mapping problem, and propose a noise-aware attention-gated network (NAAGN) for speech enhancement. Firstly, by incorporating deep residual learning and dilated convolutions into U-Net architecture, we present a deep residual U-net (ResUNet), which significantly expand receptive fields to aggregate context information systematically. Secondly, the attention-gated (AG) network is integrated into the ResUNet architecture with minimal computational overhead while furtherly increasing the longterm contexts sensitivity and prediction accuracy. Thirdly, we propose a novel noise-aware multi-task loss function, named weighted mean absolute error (WMAE) loss, in which both speech estimation loss and noise prediction loss are taken into consideration. Finally, the proposed NAAGN model was evaluated on the Voice Bank corpus and DEMAND database, which have been widely applied for speech enhancement by lots of deep learning models. Experimental results indicate that the proposed NAAGN method can achieve a larger segmental SNR improvement, a better speech quality and a higher speech intelligibility than reference methods.
What problem does this paper attempt to address?