A Weekly Supervised Speech Enhancement Strategy Using Cycle-GAN

Yang Xiang,Changchun Bao,Jing Yuan
DOI: https://doi.org/10.1109/icspcc50002.2020.9259482
2020-01-01
Abstract:Nowadays, due to the application of deep neural network (DNNS), speech enhancement (SE) technology has been significantly developed. However, most of current approaches need the parallel corpus that consists of noisy signals, corresponding speech signals and noise on the DNNs training stage. This means that a large number of realistic noisy speech signals is difficult to train the DNNs. As a result, the performance of the DNNs is restricted. In this research, a new weakly supervised speech enhancement approach is proposed to break this restriction, using the cycle-consistent generative adversarial network (CycleGAN). There are two stage for our methods. In training stage, a forward generator is employed to estimate ideal time-frequency (T-F) mask and an inverse generator is utilized to acquire noisy speech magnitude spectrum (MS). Additionally, two discriminators are used to distinguish the real clean and noisy speech from generated speech, respectively. In enhancement stage, the T-F mask is directly estimated by using the well-trained forward generator for speech enhancement. Experimental results indicate that our strategy can not only achieve satisfied performance for non-parallel data, but also acquire the higher score in speech quality and intelligibility for the DNN-based speech enhancement using parallel data.
What problem does this paper attempt to address?