Super Denoise Net: Speech Super Resolution with Noise Cancellation in Low Sampling Rate Noisy Environments

Junkang Yang,Hongqing Liu,Lu Gan,Yi Zhou
2023-10-10
Abstract:Speech super-resolution (SSR) aims to predict a high resolution (HR) speech signal from its low resolution (LR) corresponding part. Most neural SSR models focus on producing the final result in a noise-free environment by recovering the spectrogram of high-frequency part of the signal and concatenating it with the original low-frequency part. Although these methods achieve high accuracy, they become less effective when facing the real-world scenario, where unavoidable noise is present. To address this problem, we propose a Super Denoise Net (SDNet), a neural network for a joint task of super-resolution and noise reduction from a low sampling rate signal. To that end, we design gated convolution and lattice convolution blocks to enhance the repair capability and capture information in the time-frequency axis, respectively. The experiments show our method outperforms baseline speech denoising and SSR models on DNS 2020 no-reverb test set with higher objective and subjective scores.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
The paper attempts to address the joint task of speech super-resolution (SSR) and denoising in low sampling rate noisy environments. Most existing neural network SSR models primarily focus on generating the final result in noise-free environments by restoring the high-frequency part of the spectrogram and stitching it with the original low-frequency part. However, these methods perform poorly when faced with the inevitable noise in real-world scenarios. To this end, the authors propose a neural network model named Super Denoise Net (SDNet), which aims to simultaneously perform super-resolution and denoising from low sampling rate signals. Specifically, the paper points out that in noisy environments, existing methods not only fail to effectively remove noise but also produce biased predictions for the high-frequency part due to noise interference. To improve this situation, SDNet designs gated convolution and lattice convolution blocks, which are used to enhance restoration capabilities and capture information on the time-frequency axis, respectively. Experimental results show that this method outperforms baseline speech denoising and SSR models in both objective evaluation metrics and subjective scores on the DNS 2020 non-reverberant test set.