The USTC System for Blizzard Challenge 2012

Zhen-Hua Ling,Xian-Jun Xia,Yang Song,Chen-Yu Yang,Ling-Hui Chen,Li-Rong Dai
DOI: https://doi.org/10.21437/blizzard.2012-10
2012-01-01
Abstract:This paper introduces the speech synthesis system developed by USTC for Blizzard Challenge 2012. An audiobook speech corpus is adopted as the training data for system construction this year. Similar to our previous systems, the hidden Markov model (HMM) based unit selection and waveform concatenation approach is followed to develop our speech synthesis system using this corpus. Considering the inconsistent recording conditions and the narrator’s expressiveness within the corpus, we add some channel and expressiveness related labels to each sentence besides the conventional segmental and prosodic labels for system construction. The evaluation results of Blizzard Challenge 2012 show that our system performs well in all evaluation tests, which proves the effectiveness of the HMM-based unit selection approach in coping with a non-standard speech synthesis corpus. Index Terms: Speech synthesis, unit selection, hidden Markov model
What problem does this paper attempt to address?