Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis

Yibin Zheng,Ya Li,Zhengqi Wen,Bin Liu,Jianhua Tao
DOI: https://doi.org/10.1109/ISCSLP.2016.7918425
2016-01-01
Abstract:Stress is an important parameter for prosody processing in speech synthesis. However, it is not easy to stress from text analysis due to the complicated information. In this paper, we explore the novel use of the continuous lexical embedding and bidirectional long short-term memory recurrent neural network (BLSTM) model into sentential stress prediction for Mandarin speech synthesis. We look at augmenting the baseline features with word representations that are derived from text, providing continuous embedding of the lexicon in a low-dimensional space. Although learned in an unsupervised fashion, such features capture semantic and syntactic properties that make them amenable for stress prediction. We deploy various embedding models on Mandarin sentential stress prediction, showing substantial gains (relative gain gains of approximately 7.4% in F1 score).
What problem does this paper attempt to address?