Using Different Models to Label the Break Indices for Mandarin Speech Synthesis

YQ Shao,YZ Zhao,JQ Han,T Liu
DOI: https://doi.org/10.1109/icmlc.2005.1527602
2005-01-01
Abstract:High quality speech synthesis system requires effective prediction of break indices. This paper adopts a large scale corpus with five-tier break indices annotated according to C-TOBI. Based on it, several models including N-gram, artificial neural network and Markov model are employed to automatically label the break indices for unrestricted mandarin text. These approaches differ not only in models, but also in features. The results show that among these three models, MM can give the best result. The accuracy reaches 77.0% and the average error cost is 0.155. These three models are compared with each other, and some conclusions are made to dig into the problem.
What problem does this paper attempt to address?