Automatic Segmentation for TTS Units

WANG Li-juan,CAO Zhi-gang
DOI: https://doi.org/10.3969/j.issn.1000-7180.2005.12.003
2005-01-01
Abstract:Correct unit segmentation are, though laborsome, very crucial to the performance of a concatenation based TTS system. This paper suggests a two-step procedure for automatic unit segmentation, which coarsely segments speech data in the first step and refines segment boundaries in the secord step. A new Context-Dependent Boundary Model (CDBM) to describe the evolution across the segment boundary is proposed. To reduce manual segmentation, Classification and Regression Tree(CART) is used to structure the available data into a more efficient usage. Acoustically similar boundaries are clustered together and corresponding tied CDBM models are thus trained and used for boundary refinement during the secord step. After a series of experiments, the optimal CDBM parameters and the training conditions are found. The segmentation accuracy is raised from 78.7% to 91.5% in Mandarin syllable segmentation with about 1,000 manually segmented sentences as CDBM training data.
What problem does this paper attempt to address?