Shallow Diffusion for Fast Speech Enhancement (student Abstract)

Yue Lei,Bin Chen,Wenxin Tai,Ting Zhong,Fan Zhou
DOI: https://doi.org/10.1609/aaai.v38i21.30471
2024-01-01
Abstract:Recently, the field of Speech Enhancement has witnessed the success of diffusion-based generative models. However, these diffusion-based methods used to take multiple iterations to generate high-quality samples, leading to high computational costs and inefficiency. In this paper, we propose SDFEN (Shallow Diffusion for Fast spEech eNhancement), a novel approach for addressing the inefficiency problem while enhancing the quality of generated samples by reducing the iterative steps in the reverse process of diffusion method. Specifically, we introduce the shallow diffusion strategy initiating the reverse process with an adaptive time step to accelerate inference. In addition, a dedicated noisy predictor is further proposed to guide the adaptive selection of time step. Experiment results demonstrate the superiority of the proposed SDFEN in effectiveness and efficiency.
What problem does this paper attempt to address?