Research on Speech Enhancement based on Full-scale Connection

Hongyan Chen,Yan Hu
DOI: https://doi.org/10.1145/3501409.3501474
2021-10-22
Abstract:In order to solve the problem that the popular monaural speech enhancement models that based on encoder-decoder do not make full use of full-scale features, a full-scale feature connected speech enhancement model FSC-SENet is proposed. Firstly, this paper constructs a speech enhancement model based on CRN architecture. Convolutional encoder and decoder are used to extract features and recover speech signals, and LSTM modules are used to extract temporal features at the bottleneck of the model. Then a full-scale connection method and multi feature dynamic fusion mechanism are proposed, so that the decoder can make full use of the full-scale features to recover clean speech in the decoding process. Experimental results on TIMIT corpus show that compared with CRN, our FSC-SENet improves PESQ score by 0.39 and STOI score by 2.8% under seen noise cases, and PESQ score by 0.43 and STOI score by 3.1% under unseen noise cases, which proves that the proposed full-scale connection and dynamic feature fusion mechanism can make CRN have better speech enhancement performance.
What problem does this paper attempt to address?