A Superposed Prosodic Model for Chinese Text-To-Speech Synthesis
GP Chen,G Bailly,QF Liu,RH Wang
DOI: https://doi.org/10.1109/chinsl.2004.1409615
2004-01-01
Abstract:The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.