Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition

Peiyao Sheng,Zhuolin Yang,Hu Hu,Tian Tan,Yanmin Qian
DOI: https://doi.org/10.1109/ISCSLP.2018.8706651
2018-01-01
Abstract:For noise robust speech recognition, data mismatch between training and test is a significant challenge. To reduce this mismatch, traditional approach of data augmentation usually adds noise to original waveform directly. A recent work utilizes generative adversarial network (GAN) to generate data for speech recognition. In this work, we explore conditional generative adversarial network (cGAN) for data augmentation to further improve speech recognition in noisy environments. Two different conditions are explored, including the acoustic state for each speech frame and the original paired clean speech for each speech frame. Different from using basic GAN, these newly designed cGANs incorporate the specific conditions into data generation and provide true labels directly. The proposed cGAN-based data augmentation approach is evaluated on both Aurora4 and AMI-SDM, which have noise types such as additive noise, channel distortion and reverberation. Experimental results show that the cGAN-based method consistently outperforms GAN-based one under all noisy conditions, and a relative 6% to 10% WER reduction can be obtained upon an advanced acoustic model.
What problem does this paper attempt to address?