Emotional Speech Generation by Using Statistic Prosody Conversion Methods

Jianhua Tao,Aijun Li
DOI: https://doi.org/10.1007/978-1-84800-306-4_8
2009-01-01
Abstract:The chapter introduces prosody conversion models for emotional speech generation by using a Gaussian Mixture Model (GMM), and a Classification And Regression Tree (CART) model. Unlike the rule-based or linear modification method, the GMM and CART models try to map the subtle prosody distributions between neutral and emotional speech. A pitch target model that is optimized to describe Mandarin F0 contours is also introduced. For all conversion methods, a Deviation of Perceived Expressiveness (DPE) measure is created to evaluate the expressiveness of the output speech. The results show that the GMM method is more suitable for a small training set, whereas the CART method gives the better emotional speech output if trained with a large context-balanced corpus. The methods discussed in the chapter indicate ways to generate emotional speech in speech synthesis. The objective and Subjective evaluation processes are also analyzed. These results support the use of a neutral semantic content text in databases for emotional speech synthesis.
What problem does this paper attempt to address?