Hierarchical Stress Modeling in Mandarin Text-to-Speech

Ya Li,Jianhua Tao,Xiaoying Xu
DOI: https://doi.org/10.21437/interspeech.2011-529
2011-01-01
Abstract:Automatic stress prediction is helpful for both speech synthesis and natural speech understanding. This paper proposes a novel hierarchical Mandarin stress modeling method. The top level emphasizes stressed syllables, while the bottom level focuses on unstressed syllables for the first time due to its importance in both naturalness and expressiveness of synthetic speech. Maximum Entropy model is adopted to predict stress structure from textual features. Experiments show that the modeling method could capture the macro- and micro-characteristics of stress successfully. The F-score of two-level stress predictions are 73.3% and 78.7%, respectively, which are satisfactory compared to other prosody predictions. Index Terms: Text-to-Speech, prosody, stress, Mandarin
What problem does this paper attempt to address?