LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES

Yu Quanjie,Liu Peng,Wu Zhiyong,Kang Shiyin,Meng Helen,Cai Lianhong
DOI: https://doi.org/10.1109/icassp.2016.7472738
2016-01-01
Abstract:Bidirectional long short-term memory (BLSTM) based speech synthesis has shown great potential in improving the quality of the synthetic speech. However, for low-resource languages, it is difficult to obtain a high quality BLSTM model. BLSTM based speech synthesis can be viewed as a transformation between the input features and the output features. We assume that the input and output layers of BLSTM are language-dependent while the hidden layers can be language-independent if trained properly. We investigate whether sufficient training data of another language (auxiliary) can benefit the BLSTM training of a new language (target) that has only limited training data. In this paper, we propose 1) a multilingual BLSTM that shares hidden layers across different languages and 2) a specific training approach that can best utilize the training data from both the auxiliary and target languages. Experimental results demonstrate the effectiveness of the proposed approach. The multilingual BLSTM can learn the cross-lingual information, and can predict more accurate acoustic features for speech synthesis of the target language than the monolingual BLSTM that is trained with only the data from the target language. Subjective test also indicates that multilingual BLSTM outperforms the monolingual BLSTM in generating higher quality synthetic speech.
What problem does this paper attempt to address?