Voice Conversion with Ubm and Speaker-Specific Model Adaptation

Chunlei Zhu,Yibiao Yu
DOI: https://doi.org/10.1109/icosp.2012.6491548
2012-01-01
Abstract:Traditional voice conversion algorithms are usually based on parallel speech corpus and joint training, but it is difficult to obtain parallel data and inflexible to extend system in practical application. This paper presents a non-parallel and non-joint training algorithm for voice conversion using Universal Background Model (UBM) and Maximum a Posteriori (MAP) adaptation approach. First of all, a UBM is trained reflecting the speaker-independent statistical distribution of features using non-parallel speech samples of all speakers, then with the UBM acting as the prior model, every speaker-specific model is derived by using new parameter estimation based on MAP adaptation. Experimental results show that the proposed method achieves equivalent conversion performance comparing to traditional parallel corpus based method and has more flexible system extension ability.
What problem does this paper attempt to address?