Clustering and Feature Learning Based F0 Prediction for Chinese Speech Synthesis

Jianhua Tao,Lianhong Cai
DOI: https://doi.org/10.21437/icslp.2002-574
2002-01-01
Abstract:The paper describes a Chinese prosody model based on clustering and feature learning method. As the tonal language, the features of Chinese prosody are analyzed in accordance with the various context information. Focusing on the notion of prosody templates, we confirmed that a F0 pattern can be extracted based on various context parameters for each syllable. Statistic algorithm was used for template selection and training. Finally, the paper analyzes the error distribution of the F0 predicting results. Unlike other methods, the approach may give feedback as to exactly what the crucial parameters are that determine the successful choice of patterns. Both acoustic validation test and listening test show that the synthesis results are much closed to human being. And the system has been used widely in applications.
What problem does this paper attempt to address?