NVCGAN: Leveraging Generative Adversarial Networks for Robust Voice Conversion

Guoyu Zhang,Jingrui Liu,Wenhao Bi,Guangcheng Dongye,Li Zhang,Ming Jing,Jiguo Yu
DOI: https://doi.org/10.1007/978-981-97-5666-7_28
2024-01-01
Abstract:Recently, in order to improve the naturalness and similarity of speaker features after voice conversion while retaining most of the speaker content, research on voice conversion under non-parallel text conditions has made great progress. This paper proposes a voice conversion method that uses a generative adversarial network combined with an attention mechanism for multi-module integration. Both subjective and objective evaluations show that the model can generate high-quality speech representations corresponding to the target speaker and capture the identity of the speaker. It can effectively improve the naturalness of synthesized speech and further enhance the personality similarity of the speaker and it is robust to noise and acoustic conditions.
What problem does this paper attempt to address?