Design and Implementation of a Multilingual Speech Synthesis Platform
XU Jun,CAI Lian-Hong,WU Zhi-Yong
2006-01-01
Abstract:Nowadays, multilingual and mixed-lingual speech synthesis has become more and more important in information communication across different nations. Towards the key problem and current status in such researches, a new multilingual speech synthesis platform THMTTS is proposed in this paper. In the first part, the system architecture is presented. THMTTS comprises of 3 parts: basic data structure definition part, which provides a general data structure and information logging mechanism; module definition part, which gives researchers power to design and implement new algorithms for speech synthesis; Crystal Sonic, the graphic user interface (GUI), also the main entry point for speech synthesis, encapsulates the observations for data flow, debug information, module management, as well as handling file I/O and controlling wave-out device. We designed a Multi-level data structure without restricting the contents, and the GUI part is able to call the pre-defined enumeration method to iterate all the data stored and expresses it with different appearances, depending on the data type. Logs are also available to be listed in the GUI, as well as outputting to files or other streams. Another feature of this system is the smart module composition. Modules should implement the same interface and be realized in dynamic linking library (DLL). At the system initialization stage, all the modules stored in the specific place will be loaded, and then, users can manually choose which of them to be used and set the linking order. In the second part, multilingual and mixed-lingual support will be discussed. THMTTS aims to provide speech synthesis with language detection for 4 different languages including Chinese, English, Japanese and Korean. The modular structure itself has advantages for multiple language support. The current system also integrated modules that carry out encoding conversion and language detection. Language detection is based on Unicode, which is a general encoding for international use. The paper also proposed a statistical method based on the sum of probabilities to detect different language, which is proved to be effective by the experiment result. In conclusion, the platform provides general and flexible system architecture for speech analysis and synthesis. Based on this, a basic flowchart for mixed-lingual language detection and speech synthesis is introduced. The proposed architecture makes it possible to improve the quality of mixed-lingual speech synthesis.