A Unified Framework for Multilingual Text-to-speech Synthesis with SSML Specification As Interface

Wu Zhiyong,Cao Guangqi,Meng M. Helen,Cai Lianhong
DOI: https://doi.org/10.1016/s1007-0214(09)70127-0
2009-01-01
Tsinghua Science & Technology
Abstract:This paper describes the design of a unified framework for a multilingual text-to-speech(TTS) synthesis engine-Crystal.The unified framework defines the common TTS modules for different languages and/or dialects.The interfaces between consecutive modules conform to the speech synthesis markup language(SSML) specification for standardization,interoperability,multilinguality,and extensibility.Detailed module divisions and implementation technologies for the unified framework are introduced,together with possible extensions for the algorithm research and evaluation of the TTS synthesis.Implementation of a mixed-language TTS system for Chinese Putonghua,Chinese Cantonese,and English demonstrates the feasibility of the proposed unified framework.
What problem does this paper attempt to address?