Abstract:Unseen noise estimation is a key yet challenging step to make a speech enhancement algorithm work in adverse environments. At worst, the only prior knowledge we know about the encountered noise is that it is different from the involved speech. Therefore, by subtracting the components which cannot be adequately represented by a well defined speech model, the noises can be estimated and removed. Given the good performance of deep learning in signal representation, a deep auto encoder (DAE) is employed in this work for accurately modeling the clean speech spectrum. In the subsequent stage of speech enhancement, an extra DAE is introduced to represent the residual part obtained by subtracting the estimated clean speech spectrum (by using the pre-trained DAE) from the noisy speech spectrum. By adjusting the estimated clean speech spectrum and the unknown parameters of the noise DAE, one can reach a stationary point to minimize the total reconstruction error of the noisy speech spectrum. The enhanced speech signal is thus obtained by transforming the estimated clean speech spectrum back into time domain. The above proposed technique is called separable deep auto encoder (SDAE). Given the under-determined nature of the above optimization problem, the clean speech reconstruction is confined in the convex hull spanned by a pre-trained speech dictionary. New learning algorithms are investigated to respect the non-negativity of the parameters in the SDAE. Experimental results on TIMIT with 20 noise types at various noise levels demonstrate the superiority of the proposed method over the conventional baselines.

Unsupervised speech enhancement with deep dynamical generative speech and noise models

Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders

Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement

Dynamic noise aware training for speech enhancement based on deep neural networks.

Unsupervised speech enhancement with diffusion-based generative models

A Conditional Generative Model for Speech Enhancement

Unsupervised Noise adaptation using Data Simulation

A regression approach to speech enhancement based on deep neural networks

Unsupervised Speech Enhancement Using Optimal Transport and Speech Presence Probability

A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement.

Diffusion-based Unsupervised Audio-visual Speech Enhancement

Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

A Recurrent Variational Autoencoder for Speech Enhancement

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech

A variance modeling framework based on variational autoencoders for speech enhancement

A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement

DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement.

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments

Collaborative Deep Learning for Speech Enhancement: A Run-Time Model Selection Method Using Autoencoders

Dynamic Noise Embedding: Noise Aware Training and Adaptation for Speech Enhancement