Improving Prosodic Boundaries Prediction For Mandarin Speech Synthesis By Using Enhanced Embedding Feature And Model Fusion Approach

Yibin Zheng,Ya Li,Zhengqi Wen,Xingguang Ding,Jianhua Tao
DOI: https://doi.org/10.21437/Interspeech.2016-1060
2016-01-01
Abstract:Hierarchical prosody structure generation is an important but challenging component for speech synthesis systems. In this paper, we investigate the use of enhanced embedding (joint learning of character and word embedding (CWE)) features and different model fusion approaches at both character and word level for Mandarin prosodic boundaries prediction. For CWE module, the internal structures of words and non compositional words are considered in the word embedding, while the character ambiguity is addressed by multiple prototype character embedding. For model fusion module, linear function (LF) and gradient boosting decision tree (GBDT), are investigated at the decision level respectively, with the important features selected by feature ranking module used as its input. Experiment results show the effectiveness of the proposed enhanced embedding features and the two model fusion approaches at both character and word level.
What problem does this paper attempt to address?