Multiple Generator Gan with Self-Attention for Speech Enhancement

Bismark Kweku Asiedu Asante,Hiroki Imamura
DOI: https://doi.org/10.2139/ssrn.4050483
2022-01-01
SSRN Electronic Journal
Abstract:A variety of generative adversarial networks (GANs) have been used to enhance distorted speech signals. This variety is due to various implementations to overcome the different challenges in enhancing the noisy speech signals. One of such challenges is the mode collapse problems. For the speech enhancement tasks, conditional GANs (cGANs) are often used and are prone to the mode collapse problems. Our research proposes using a GAN with multiple generators which has been proven as a solution to mode collapse with self-attention layers to synthesize clean speech from raw speech signal input including noise. The proposed approach efficiently removes various noisy speech signals from a given sample by generating a synthetic speech signal using multiple generators in a GAN. The approach also remedies the obscure temporal dependencies by adding self-attention layers to the generators. The results obtained from experimentation showed an improved MOS (Mean Opinion Score) score of 3.25 over the baseline results reported by SASEGAN (Phan et al., 2021) and SEGAN (Pascual et al., 2017) models. Our experiments indicate that including self-attention layers within the multiple generators can help enhance the temporal dependencies and help remove various noises. It improves the sequence-to-sequence capabilities of the multi generators that help overcome the mode collapse problems occurring during the training GANs.
What problem does this paper attempt to address?