MSAF: A Multiple Self-Attention Field Method for Speech Enhancement

Minghang Chu,Jing Wang,Yaoyao Ma,Zhiwei Fan,Mengtao Yang,Chao Xu,Zhi Tao,Di Wu
DOI: https://doi.org/10.21437/interspeech.2023-886
2023-01-01
Abstract:Speech enhancement (SE) systems, based on generative adversarial networks (GANs), are limited in improving speech quality and intelligibility. In this study, we propose a novel multiple self-attention field method for speech enhancement (MSAF). The models with different positions of the self-attention layers focus on different features. The output of each model is assigned a different feature weight, which is obtained by training. Then, we fuse the models according to the feature weights to obtain a clean speech signal. For speech quality, the proposed method improves by 8.22%, 8.52%, 9.28%, and 9.40% in CBAK, CSIG, COVL, and PESQ on average compared with the baseline SASEGANs. The results show that the MSAF comprehensively improves the performance of the baseline SASEGAN and performs better than the mainstream GAN-based SE methods. Importantly, the proposed method can be extended to other GAN-based SE methods.
What problem does this paper attempt to address?