Dblstm-Based Multi-Task Learning for Pitch Transformation in Voice Conversion
Runnan Li,Zhiyong Wu,Helen Meng,Lianhong Cai
DOI: https://doi.org/10.1109/iscslp.2016.7918466
2016-01-01
Abstract:While both spectral and prosody transformation are important for voice conversion (VC), traditional methods have focused on the conversion of spectral features with less emphasis on prosody transformation. This paper presents a novel pitch transformation method for VC. As the correlation of spectral features and fundamental frequency in pitch perceptions has been proved, well-converted spectrum should benefit to pitch transformation. Motivated by this, a multi-task learning (MTL) framework based on deep bidirectional long short-term memory (DBLSTM) recurrent neural network (RNN) has been proposed for pitch transformation in VC. DBLSTM is used to model the long short-term dependencies across speech frames for spectral conversion; the converted spectrum and the source pitch contour are further simultaneously modeled to generate the converted target pitch contour and voiced/unvoiced flag; the above tasks are incorporated with the MTL framework to enhance the performances of each other. Experimental results indicate the proposed method outperforms the conventional approaches in pitch transformation.
What problem does this paper attempt to address?