Innovative Speaker-Adaptive Style Transfer VAE-WadaIN for Enhanced Voice Conversion in Intelligent Speech Processing

Chen Liu,Minghan Guo,Jiaojuan Wang,Liangyuan Xue
DOI: https://doi.org/10.1109/ISCTIS63324.2024.10699199
2024-07-12
Abstract:Voice Conversion (VC) is vital in intelligent speech processing, aiming to alter speech timbre while maintaining linguistic content. Existing methods often neglect speaking style, including emotions and intonation, resulting in reliance on inadequate datasets and suboptimal timbre accuracy. To address this, we propose Speaker-Adaptive Style-Transfer VAE-WadaIN, using Variational Autoencoders (VAEs) to create personalized Gaussian distributions for speech samples. VAEs’ sampling and reconstruction mitigate noise concerns, aided by WadaIN for separating timbre and content. This enables adaptive timbre and content recombination, yielding diverse styles while retaining semantic data. Our method effectively transforms non-linguistic voice attributes. Experimental results validate style transfer efficacy, enhancing data use efficiency and timbre similarity. Our approach achieves a Mandarin speech rating akin to top speakers, promising bilingual timbre transformation for Mandarin instruction, thus aiding language education through voice conversion tasks.
Computer Science
What problem does this paper attempt to address?