Improved Relativistic Cycle-Consistent GAN with Dilated Residual Network and Multi-Attention for Speech Enhancement
Yutian Wang,Guochen Yu,Jingling Wang,Hui Wang,Qin Zhang
DOI: https://doi.org/10.1109/access.2020.3029417
IF: 3.9
2020-01-01
IEEE Access
Abstract:Generative adversarial networks (GANs) have been increasingly used as feature mapping functions in speech enhancement, in which the noisy speech features are transformed to the clean ones through the generators. This article proposes a novel speech enhancement model based on a cycle-consistent relativistic GAN with Dilated Residual Networks and a Multi-attention mechanism. Using the adversarial loss, improved cycle-consistency losses, and an identity-mapping loss, a noisy-to-clean generator G and an inverse clean-to-noisy generator F simultaneously learn the forward and backward mappings between the source and target domains. To guarantee the stability of the training process, we replace vanilla GAN loss with relativistic average GAN loss and use spectral normalization in discriminators so that they conform to Lipschitz continuity. Furthermore, we employ two attention-based components as multi-attention mechanism to reduce importing signal distortion: attention U-net gates and dilated residual self-attention blocks. By employing these components, our proposed generators can capture long-term inner dependencies between elements of speech features and further preserve linguistic information. Experimental results on a public dataset indicate that the proposed model achieves state-of-the-art speech enhancement performance, especially in reducing speech distortion and improving signal overall quality. Compared with the representative GAN-based approaches, the proposed method significantly achieves the best performance in terms of STOI, CSIG, COVL, and CBAK objective metrics. Moreover, we demonstrate the contribution of each proposed component including relativistic average loss, attention U-net gate, self-attention layers, spectral normalization, and dilation operation by ten comparison systems.