Single-Channel Speech Enhancement Algorithm Based on ME-MGCRN in Low Signal-to-Noise Scenario
Chaofeng Lan,Shilong Zhao,Huan Chen,Lei Zhang,Yuchen Yang,Zixu Fan,Meng Zhang
DOI: https://doi.org/10.1109/access.2024.3431713
IF: 3.9
2024-07-26
IEEE Access
Abstract:In low signal-to-noise ratio (SNR) conditions, to address the problem of poor speech enhancement effect of traditional neural networks, this paper combines Convolution Recurrent Neural Network (CRN) with Gated Linear Units (GLU) to extract speech characteristics. Meanwhile, considering the advantages of the Empirical Mode Decomposition (EMD) algorithm, this paper put forward a model for speech enhancement that is based on Adaptive Mean Median Empirical Modal Decomposition and Multilayer Gated Convolutional Recurrent Neural Networks (ME-MGCRN). The model first uses the improved EMD algorithm to split the original noisy speech into low-frequency and high-frequency components. After that, the noisy speech is denoised using noise correlation and completes the feature extraction process. Finally, the extracted feature information is fed into the MGCRN network to realize speech enhancement further. This paper analyzes the performance of the Adaptive Mean Median Empirical Modal Decomposition algorithm using the LibriSpeech ASR dataset. This paper compares and investigates the enhanced performance of the ME-MGCRN model with the baseline model and the traditional model in terms of evaluation metrics such as the perceptual evaluation of speech quality (PESQ) and short-time objective intelligence (STOI). The research indicates that the ME-MGCRN model proposed in this paper has improved in the evaluation indexes, such as PESQ and STOI, respectively, by 0.22% and 5.6% than the baseline model and the traditional model, and the speech enhancement effect is better when Huber is selected as the loss function.
computer science, information systems,telecommunications,engineering, electrical & electronic