LIMI-VC: A Light Weight Voice Conversion Model with Mutual Information Disentanglement

Liangjie Huang,Tian Yuan,Yunming Liang,Zeyu Chen,Can Wen,Yanlu Xie,Jinsong Zhang,Dengfeng Ke
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096399
2023-01-01
Abstract:Voice conversion(VC) model aims to convert the source timbre to the target one. Recently, many VC models utilize pre-trained models to enhance the performance and achieve good results. However, pre-trained models could not somehow disentangle the timbre and linguistic information, thus resulting in a redundancy, which may hurt the conversion performance. In this paper we proposed LIMI-VC, reducing the redundancy between the linguistic content and the timbre information with mutual information disentanglement. We design the model in a light weight form, for the sake of parameter and computation efficiency when pre-trained models are commonly used nowadays. Experiments show that the proposed model can still improve the performance, with 15 times smaller size, compared to baseline. An out-of-domain cross-lingual inference also shows that our model greatly outperforms the baseline. Our source code and audio examples will be available at: https://github.com/WongLaw/LIMI-VC.
What problem does this paper attempt to address?