Analysis Syntactic Parsing Speech Synthesis Text Text Analysis Syntactic Parsing Speech Database Training Synthesis Training of HMMs Speech Database Data Selection Feature Extraction HMMs

Yansuo Yu,Fengyun Zhu,Xiangang Li,Yi Liu,Jun Zou,Yuning Yang,Guilin Yang,Ziye Fan,Xihong Wu
2013-01-01
Abstract:This paper introduces the SHRC-Ginkgo speech synthesis system for Blizzard Challenge 2013. A unit selection based approach is adopted to develop our speech synthesis system using audiobook speech corpus. Aiming at roughly labeled corpora with several hundred hours of speech, our system adopts lightlysupervised acoustic model training of speech recognition to select clean speech data with accurate text. Moreover, rich syntactic contexts instead of prosodic structure are utilized to refine traditional acoustic models. Through automatic syntactic parsing, this way can also help to label the corpora of several tens or even hundreds of hours automatically, thus avoiding manually prosodic annotation with time-consuming and expensive effort. In order to solve the problems of memory space expansion and running time burden for acoustic model training of large-scale corpora, a fast training method, which can ensure the accuracy of acoustic model, is realized. Subjective evaluation results show that our system performs well in almost all evaluation tests, especially in the case of large-scale corpora.
What problem does this paper attempt to address?