A Nested U-Net with Efficient Channel Attention and D3Net for Speech Enhancement

DOI: https://doi.org/10.1007/s00034-023-02300-x
2023-02-10
Abstract:The advanced improvements in deep learning neural networks in the speech enhancement area have vastly improved. The performance of speech enhancement is still limited because widely used existing techniques cannot fully exploit contextual information from multiple scales. To address this issue, we propose a nested U-Net with efficient channel attention and D3Net (ECAD3MUNet) for speech enhancement. The proposed ECAD3MUNet is an encoder and decoder model with skip connections to improve information flow. In ECAD3MUNet, a novel densely connected dilated DenseNet (D3Net) block is incorporated with a multi-scale feature extraction block to explore large-scale contextual information. In this way, the benefits of local and global features can be completely leveraged to increase speech reconstruction abilities. D3Net uses revolutionary multi-dilated convolution with a variable dilation factor in a single layer to simulate many resolutions at the same time. D3Net improves the growth of a receptive field and the simultaneous modeling of multi-resolution data in a single convolution layer. D3Net addresses the aliasing problem that occurs when we naively include dilated convolution in the DenseNet model. Additionally, a novel cross-channel interaction can be implemented via the efficient channel attention (ECA) module without dimensionality reduction. In module testing, choosing an adaptable kernel size for the ECA improved network performance significantly. We incorporated the D3Net and ECA modules into the proposed model for better feature extraction and utterance-level context aggregation. The proposed ECAD3MUNet model experimental results outperform other baseline models in objective speech quality and intelligibility scores.
engineering, electrical & electronic
What problem does this paper attempt to address?