Automatic Conversion from Lexical Words to Prosodic Words for Mandarin Text-to-speech System
Yanqiu Shao,Jiqing Han,Ting Liu,Yongzhen Zhao
DOI: https://doi.org/10.1007/s10772-008-9013-5
2007-01-01
International Journal of Speech Technology
Abstract:In real speech, not like lexical words (LWs), prosodic words (PWs) are basic rhythmic units. The naturalness of a Text-to-Speech (TTS) system is directly influenced by the segmentation of the PWs. Most of the PWs are the combination of several LWs. In this paper, three Lexical Combination Models are proposed to combine LWs into PWs, including a Directed Acyclic Graph Model, a Segmentation Model and a Markov Model (MM). To cope with the situation where some long LWs should be segmented into two or more PWs, a Lexical Split Model (LSM) is applied to the long LWs. Experimental results prove that relatively constant results with various training data can be obtained from a MM. The Transformation-Based Error Driven Learning (TBED) algorithm, for its high performance of individual property, is applied in combination with the MM to improve the precision of PW segmentation. Experiments show that among the three proposed models, the MM combined with TBED and LSM, leads to the best performance, in which a precision of 93.00% and a recall of 93.23% are achieved. The perception test indicates that by using PWs as the lowest prosodic units a speech sounds more natural and acceptable than by using LWs.