SRTNET: Time Domain Speech Enhancement Via Stochastic Refinement

Zhibin Qiu,Mengfan Fu,Yinfeng Yu,Lili Yin,Fuchun Sun,Hao Huang
DOI: https://doi.org/10.1109/icassp49357.2023.10095850
2022-01-01
Abstract:Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose SRTNet, a novel method for speech enhancement via Stochastic Refinement in complete Time b domain. Specifically, we design a joint network consisting of a deterministic module and a stochastic module, which makes up the "enhance-and-refine" paradigm. We theoretically demonstrate the feasibility of our method and experimentally prove that our method achieves faster training, faster sampling and higher quality. Our code is available at https://github.com/zhibinQiu/SRTNet.git
What problem does this paper attempt to address?