Filtering and Refining: A Collaborative-Style Framework for Single-Channel Speech Enhancement.
Andong Li,Chengshi Zheng,Guochen Yu,Juanjuan Cai,Xiaodong Li
DOI: https://doi.org/10.1109/taslp.2022.3184889
2022-01-01
Abstract:In low signal-to-noise ratio (SNR) acoustic scenarios, it remains fairly challenging to extract the target speech from its noisy mixture. In this paper, we propose a collaborative-style framework, namely, filtering and refining network (FRNet) for single-channel speech enhancement, recovering the complex spectrum of the target speech from coarse and fine-grained perspectives. Specifically, we devise a two-branch structure dubbed filtering-refining module (FRM). In the filtering block, the phase impact is ignored, and we only focus on coarse filtering in the magnitude domain. In the refining block, instead of predicting the irregular phase distribution directly, we estimate the complex residual for phase modification and spectrum rehabilitation, which takes the harmonic structure but with rather sparse energy distribution. By cascading FRMs repeatedly, we can reconstruct the target spectrum progressively. Furthermore, we propose a two-stream feature encoder to extract the feature representation of magnitude and phase individually, and the utilization of feature recalibration layers can preserve the prominent information from multiple scales. Extensive experiments are conducted on the WSJ0-SI84, Voicebank+Demand, and DNS-Challenge corpora. Evaluation results show that the proposed system performs favorably against previous advanced systems and achieves overall state-of-the-art performance in PESQ, ESTOI, SDR, and DNSMOS metrics.