The WISTON Text to Speech System for Blizzard 2008

Jianhua Tao,Jian Yu,Lixing Huang,Fangzhou Liu,Huibin Jia,Meng Zhang
DOI: https://doi.org/10.21437/blizzard.2008-12
2008-01-01
Abstract:The WISTON system is a large corpus based TTS system with the unit selection method. The text analysis part of this system contains text pre-processing, word segmentation, POS tagging, phonetic transcription and prosody structure prediction. The prosody information (duration, F0, energy) is predicted by the CART model with the input context information. In the unit selection model, we use the mutual prosody constraint as the part of concatenation costs for the path searching while the predicted F0s, durations and energies are used to get the target costs. The spectrum smoothing method is also used for the speech generation. The final system was used to attend Blizzard evaluation for both English test and Mandarin test. Good scores were got based on this system.
What problem does this paper attempt to address?