Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition.

Zhihao Du,Jiqing Han,Xueliang Zhang
DOI: https://doi.org/10.21437/interspeech.2020-1504
2020-01-01
Abstract:To improve the noise robustness of automatic speech recognition (ASR), the generative adversarial network (GAN) based enhancement methods are employed as the front-end processing, which comprise a single adversarial process of an enhancement model and a discriminator. In this single adversarial process, the discriminator is encouraged to find differences between the enhanced and clean speeches, but the distribution of clean speeches is ignored. In this paper, we propose a double adversarial network (DAN) by adding another adversarial generation process (AGP), which forces the discriminator not only to find the differences but also to model the distribution. Furthermore, a functional mean square error (f -MSE) is proposed to utilize the representations learned by the discriminator. Experimental results reveal that AGP and f -MSE are crucial for the enhancement performance on ASR task, which are missed in previous GAN-based methods. Specifically, our DAN achieves 13.00% relative word error rate improvements over the noisy speeches on the test set of CHiME-2, which outperforms several recent GAN-based enhancement methods significantly.
What problem does this paper attempt to address?