A Novel Hybrid Mandarin Speech Synthesis System Using Different Base Units for Model Training and Concatenation

Ran Zhang,Jianhua Tao,Ya Li,Zhengqi Wen
DOI: https://doi.org/10.1109/icassp.2014.6853605
2014-01-01
Abstract:The hybrid speech synthesis system, which uses the acoustic model trained according to the criterion of Maximum Likelihood to select the proper candidates from the corpus, has become a hot topic in recent days. For this hybrid system, the performance is affected by the size of the base training unit and the base candidate unit. Most of existed hybrid systems use the same kind of base unit such as syllable or phone for both model training and concatenation. In Mandarin, initials and finals form the fundamental elements of pronunciation, and are always chosen as the base training unit for statistical parametric TTS system. In this paper a new hybrid Mandarin TTS system is proposed, which uses initial/final for model training and syllable for concatenation. Objective and subjective evaluations are conducted and the comparison results show that the hybrid system we proposed outperforms the traditional systems which use the same base unit for both processes with 4000 and 6000 sentences' corpus.
What problem does this paper attempt to address?