Abstract:Nowadays, multilingual and mixed-lingual speech synthesis has become more and more important in information communication across different nations. Towards the key problem and current status in such researches, a new multilingual speech synthesis platform THMTTS is proposed in this paper. In the first part, the system architecture is presented. THMTTS comprises of 3 parts: basic data structure definition part, which provides a general data structure and information logging mechanism; module definition part, which gives researchers power to design and implement new algorithms for speech synthesis; Crystal Sonic, the graphic user interface (GUI), also the main entry point for speech synthesis, encapsulates the observations for data flow, debug information, module management, as well as handling file I/O and controlling wave-out device. We designed a Multi-level data structure without restricting the contents, and the GUI part is able to call the pre-defined enumeration method to iterate all the data stored and expresses it with different appearances, depending on the data type. Logs are also available to be listed in the GUI, as well as outputting to files or other streams. Another feature of this system is the smart module composition. Modules should implement the same interface and be realized in dynamic linking library (DLL). At the system initialization stage, all the modules stored in the specific place will be loaded, and then, users can manually choose which of them to be used and set the linking order. In the second part, multilingual and mixed-lingual support will be discussed. THMTTS aims to provide speech synthesis with language detection for 4 different languages including Chinese, English, Japanese and Korean. The modular structure itself has advantages for multiple language support. The current system also integrated modules that carry out encoding conversion and language detection. Language detection is based on Unicode, which is a general encoding for international use. The paper also proposed a statistical method based on the sum of probabilities to detect different language, which is proved to be effective by the experiment result. In conclusion, the platform provides general and flexible system architecture for speech analysis and synthesis. Based on this, a basic flowchart for mixed-lingual language detection and speech synthesis is introduced. The proposed architecture makes it possible to improve the quality of mixed-lingual speech synthesis.

Handheld Speech to Speech Translation System

Recent Advances of IBM’s Handheld Speech Translation System

Recent Advances of IBM ’ s Handheld

A Hand-Held Speech-to-speech Translation System

Two-way speech-to-speech translation on handheld devices

IBM MASTOR System

IBM Mastor: Multilingual Automatic Speech-To-Speech Translator

Design of a speech interaction system for man-machine confrontation

The IBM Speech-to-speech Translation System for Smartphone: Improvements for Resource-Constrained Tasks.

A Hand-Held Multimedia Translation and Interpretation System with Application to Diet Management

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

Speech to Text Conversion using Android Platform

Towards High Performance LVCSR in Speech-to-Speech Translation System on Smart Phones.

MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic Translation System

Speaker Independent Continuous Speech to Text Converter for Mobile Application

Towards Speech Translation of Non Written Languages

Social Messaging Application with Translation and Speech-to-Text Transformation

Design and Implementation of a Multilingual Speech Synthesis Platform

Embedded Speech Processing System for Car Wireless Terminals

A Mobile Phone based Speech Therapist

A System for Mandarin Short Phrase Recognition on Portable Devices