Multi-Stage Progressive Speech Enhancement Network

Xinmeng Xu,Yang Wang,Dongxiang Xu,Yiyuan Peng,Cong Zhang,Jie Jia,Binbin Chen
DOI: https://doi.org/10.21437/interspeech.2021-520
2021-01-01
Abstract:Speech enhancement is a fundamental way to separate and generate clean speech from adverse environment where the received speech is seriously corrupted by noise. This paper applies a novel progressive network for speech enhancement by using multi-stage structure, where each stage contains a channel attention block followed by dilated encoder-decoder convolutional network with gated linear units. In addition, each stage generates a prediction that is refined by a supervised attention block. What is more, a fusion block is inserted between original inputs and outputs of previous stage. Multi-stage architecture is introduced to sequentially invoke multiple deep-learning networks, and its key ingredient is the information exchange between different stages. Thus, a more flexible and robust outputs can be generated. Experimental results show that the proposed architecture obtains consistently better performance than recent state-of-the-art models in terms of both PESQ and STOI scores.
What problem does this paper attempt to address?