Abstract:Abstract Speech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Gaussian Density Guided Deep Neural Network For Single-Channel Speech Enhancement

Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement.

Error Modeling Via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement

A Maximum Likelihood Approach to Deep Neural Network Based Speech Dereverberation

A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.

A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation

A regression approach to speech enhancement based on deep neural networks

A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

Dynamic noise aware training for speech enhancement based on deep neural networks.

Deep Neural Network Based Noised Asian Speech Enhancement and Its Implementation on a Hearing Aid App.

An Experimental Study on Speech Enhancement Based on Deep Neural Networks

Deep Speaker Vector Normalization with Maximum Gaussianality Training

DNN Training Based on Classic Gain Function for Single-channel Speech Enhancement and Recognition.

Speech Enhancement using a Deep Mixture of Experts

Noise Estimation Using Mean Square Cross Prediction Error for Speech Enhancement

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Multi-task single channel speech enhancement using speech presence probability as a secondary task training target

Single-Channel Speech Enhancement Algorithm Based on ME-MGCRN in Low Signal-to-Noise Scenario

Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

On Generating Mixing Noise Signals With Basis Functions For Simulating Noisy Speech And Learning Dnn-Based Speech Enhancement Models