Grapheme-to-Phoneme Conversion in Mandarin Chinese Text-to-Speech System

Hongwei Ding,Oliver Jokisch
2009-01-01
Abstract:We present a lexicon-based model for segmenting Chinese text into dictionary entries and for providing pronunciations for these words. This approach adopts a matching algorithm combined with several heuristic rules to resolve the ambiguities. It can achieve total accuracy over 95%, which proved to be an effective solution to grapheme-to-phoneme conversion for Mandarin Chinese. Introduction The written Chinese texts are composed with strings of characters without blanks to delimit words. The first step towards word-based indexing is to break a sequence of characters into words. This process is called word segmentation. On the other hand, it is not possible to bypass the word-segmentation problem. The main reason is that many Chinese characters are homographs, whose pronunciation depends upon word affiliation. The Problem of Word Segmentation There are difficulties with the word identification process. First of all, almost all characters are free morphemes, which can be one-character words by themselves. They can also join other characters to form multi-character words. Second, compounding is the predominant word formation device in modern Chinese. It is difficult to tell whether a lowfrequency compound is a word or phrase. Third, the same pool of characters is also used in constructing proper names, which brings difficulty in personal name identification [2]. Strategies in Word Segmentation In order to cope with this problem, there exist some methods which can be classified into (1) Purely statistical approaches [1]; (2) Heuristic rule-based methods [2]; (3) Statistical approaches which incorporate lexical knowledge [3]. Many statistical methods are based on a large pre-segmented text corpus for their analysis. The easiest and most effective one is the lexical based algorithm with supplementary rules. This is also adopted in our TTS system DRESS, but is modified to pass our system. The paper first introduces our synthesis system. It then presents the solution of word identification and phonetic conversion. Finally, it points out the possibility for future research. Synthesis System The Mandarin Chinese Text-to-Speech system developed at TU Dresden is a syllable-based waveform concatenation synthesis. It consists of text analysis and acoustic synthesis. The acoustic synthesis is already accomplished with high naturalness. A syllable-based inventory takes the crosssyllable co-articulation into consideration [4]. A neural network is responsible for learning and modifying the duration and intonation [5]. Because of the unsolved problem of grapheme-to-phoneme conversion, the word boundaries had been inserted manually in the process of synthesis. This paper presents the solution of word segmentation, which makes the whole text-to-speech system to operate automatically. Word Segmentation The processing stage of word segmentation includes an algorithm of maximum matching with word lexicon, several ambiguity resolution rules, and some solutions to deal with time, numeral expressions and to identify personal names. Input text (A string of Chinese characters) Figure 1: Grapheme-phoneme conversion Maximum Path-Matching The lexical-based word identification approach is matching, the basic strategy is to match the input characters string with a large set of entries stored in a pre-compiled lexicon to find all (or part of) possible segmentations. Another variant of maximal matching done in [2] says that the most plausible segmentation is the three-word chunks with maximal length. This algorithm is adopted in our system. Word Identification Word Lexicon Ambiguity Resolution Rules Time & Numerals Expressions Name Identity Grapheme-Phoneme Conversion Prosodic Generation Phonetic Sequences with Tones Segmented Words in Characters Word Lexicon with Phonetic Transcription CFA/DAGA'04, Strasbourg, 22-25/03/2004
What problem does this paper attempt to address?