Mandarin Stress Analysis And Prediction For Speech Synthesis

Ya Li,Jianhua Tao
DOI: https://doi.org/10.1007/978-3-662-45258-5_6
2015-01-01
Abstract:Expressive speech synthesis has recently received much attention. Stress (or pitch accent) is the perceptual prominence within words or utterances, and is one important feature in forming the highs and lows of the pitch contour, which makes the speech sounds more expressive. In this chapter, we introduce a large-scale stress annotated continuous Mandarin corpus. Then the stress distribution and its stability are thoroughly analyzed from aspects of rhythm level and tone pattern. Based on these results, we propose a novel hierarchical Mandarin stress modeling method. The top level emphasizes stressed syllables, while the bottom level focuses on unstressed syllables for the first time due to its importance in both naturalness and expressiveness of synthetic speech. We also carried out several experiments to assign the Mandarin stress from textual features by using the classification and regression tree (CART) and maximum entropy (ME) model respectively. The work could be beneficial to speech synthesis systems for generating high natural and expressive speech.
What problem does this paper attempt to address?