Data mining for learning mandarin prosodic models

Tingshao Zhu,Wen Gao
2000-01-01
Jisuanji Xuebao/Chinese Journal of Computers
Abstract:Mandarin prosodic models are very important in speech research and speech synthesis, which mainly describe the variation of pitch. The models that are now being used in most Chinese Text-To-Speech systems are constructed by expert, qualitatively and with low precision. In this paper, Data Mining is used to extract more accurate prosodic patterns from actual large mandarin speech database to improve the naturalness and intelligibility of synthesized speech. In data preprocessing, typical prosody models are found by clustering analysis, and the original pitches extracted from sentences are discrete with classic pitch models. These clusters together with some linguistic features (including tone combination, word length, part-of-speech (POS), syllable position in word, word position in phrase) obtained by text parsing are use to acquire training data. ANN and Decision tree are trained respectively using above integrated data to learn the variation prosody models of pitch. Two decision trees are constructed for predicting the classic pitch model and length of pitch based on C4.5, and BackPropagation (BP) network is used to learn the mapping between the linguistic features and the mean value of pitch. Encouraging experimental results show the effectiveness of the proposed method base on Data Mining.
What problem does this paper attempt to address?