CRA-DIFFUSE: IMPROVED CROSS-DOMAIN SPEECH ENHANCEMENT BASED ON DIFFUSION MODEL WITH T-F DOMAIN PRE-DENOISING

Zhibin Qiu,Yachao Guo,Mengfan Fu,Hao Huang,Ying Hu,Liang He,Fuchun Sun
DOI: https://doi.org/10.1109/icme55011.2023.00294
2023-01-01
Abstract:Speech enhancement (SE) methods in both the Time-Frequency (T-F) domain and time-domain domains have their own advantages. Leveraging both T-F domain and time-domain (cross domain) inputs has shown to be successful in the speech enhancement task. Recent SE methods based on diffusion models have shown promising results. However, little research effort has been made in the cross-domain speech enhancement using a diffusion model. We propose CRA-DiffuSE, a cross-domain SE model that uses a diffusion-based enhancement model as a refinement module after initial enhancement to achieve better results. For pre-enhance stage, we design CRANet, a T-F domain enhancement model combining channel attention and spatial attention. For the post-enhance stage, we design DiffuNet, a conditional generation model based on Denoising Diffusion Implicit Model (DDIM) for speech enhancement. Experiments demonstrate that the proposed CRA-DiffuSE is significantly superior to the baselines.
What problem does this paper attempt to address?