CycleGAN-VC-GP: Improved CycleGAN-based Non-parallel Voice Conversion

Chao Wang,Yi-Biao YU
DOI: https://doi.org/10.1109/icct50939.2020.9295938
2020-10-28
Abstract:Non-parallel voice conversion is an important but challenging task due to the lack of parallel data. Recently, Cycle-Consistent Generative Adversarial Networks based voice conversion (CycleGAN-VC) has achieved great success in non-parallel voice conversion. However, due to the instability of GAN training, there is still a large gap between the real target speech and the converted speech. To reduce the gap, in this paper, CycleGAN-VC-GP is proposed which incorporates two new techniques: (1) zero-centered gradient penalties are used to ensure the convergence of GAN; (2) the fundamental frequency is combined with the spectrum to improve prosody conversion. Both subjective and objective evaluation experiments showed that the proposed method obtained higher speaker similarity and comparable speech quality compared with the state-of-art method based on CycleGAN-VC.
What problem does this paper attempt to address?