An Optimized Neural Network Based Prosody Model of Chinese Speech Synthesis System

JH Tao,LH Cai,H Tropf
DOI: https://doi.org/10.1109/tencon.2002.1181317
2002-01-01
Abstract:To generate a pitch contour in high quality is a very important issue for each TTS system. Until now, the naturalness of it is still far from being satisfactory. In this paper, a trainable prosody model, based on a neural network, is described for a Mandarin TTS system. Extensive tests show that the structure of the neural network characterizes the Mandarin prosody more accurately than traditional models. The naturalness of the result has been improved a lot and the system performs more flexibly in practice. Furthermore, personal and task specific characteristics are also maintained. The paper adopts a fuzzy clustering algorithm in classifying the pitch contours of the Mandarin syllables. The algorithm has been proved very useful for optimizing the neural network and making it suitable to deal with the pitch contours of Mandarin.
What problem does this paper attempt to address?