Convolutional Recurrent Neural Network with Attention for 3D Speech Enhancement

Han Yin,Jisheng Bai,Mou Wang,Siwei Huang,Yafei Jia,Jianfeng Chen
2023-11-20
Abstract:3D speech enhancement can effectively improve the auditory experience and plays a crucial role in augmented reality technology. However, traditional convolutional-based speech enhancement methods have limitations in extracting dynamic voice information. In this paper, we incorporate a dual-path recurrent neural network block into the U-Net to iteratively extract dynamic audio information in both the time and frequency domains. And an attention mechanism is proposed to fuse the original signal, reference signal, and generated masks. Moreover, we introduce a loss function to simultaneously optimize the network in the time-frequency and time domains. Experimental results show that our system outperforms the state-of-the-art systems on the dataset of ICASSP L3DAS23 challenge.
Audio and Speech Processing,Sound
What problem does this paper attempt to address?
This paper aims to address the problem of 3D Speech Enhancement to improve the quality and clarity of speech signals. Specifically, traditional convolution-based speech enhancement methods have limitations in extracting dynamic speech information. To solve this problem, the paper proposes a Convolutional Recurrent Neural Network (CRNN) that combines a Dual-path RNN module and an attention mechanism, and iteratively models the time and frequency domains within a U-Net structure. Additionally, the paper introduces a new loss function that optimizes network performance in both the time and frequency domains simultaneously. Experimental results show that this system outperforms existing state-of-the-art systems on the ICASSP L3DAS23 challenge dataset.