Abstract:In contrast to the conventional minimum mean square error (MMSE)-based noise reduction techniques, we propose a supervised method to enhance speech by means of finding a mapping function between noisy and clean speech signals based on deep neural networks (DNNs). In order to be able to handle a wide range of additive noises in real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is first designed. A DNN architecture is then employed as a nonlinear regression function to ensure a powerful modeling capability. Several techniques have also been proposed to improve the DNN-based speech enhancement system, including global variance equalization to alleviate the over-smoothing problem of the regression model, and the dropout and noise-aware training strategies to further improve the generalization capability of DNNs to unseen noise conditions. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the conventional MMSE based technique. It is also interesting to observe that the proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.

Iterative Noisy-Target Approach: Speech Enhancement Without Clean Speech

Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech

DENOISPEECH: DENOISING TEXT TO SPEECH WITH FRAME-LEVEL NOISE MODELING

Dynamic noise aware training for speech enhancement based on deep neural networks.

Noisy training for deep neural networks in speech recognition

A Novel Training Strategy Using Dynamic Data Generation for Deep Neural Network Based Speech Enhancement.

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

Neural Speech Enhancement with Unsupervised Pre-Training and Mixture Training

Unsupervised Noise adaptation using Data Simulation

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement

A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.

A regression approach to speech enhancement based on deep neural networks

Joint Noise and Mask Aware Training for DNN-based Speech Enhancement with SUB-band Features

On Generating Mixing Noise Signals With Basis Functions For Simulating Noisy Speech And Learning Dnn-Based Speech Enhancement Models

A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise

Audio Enhancement for Computer Audition—An Iterative Training Paradigm Using Sample Importance

Audio Enhancement for Computer Audition -- An Iterative Training Paradigm Using Sample Importance

Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems