Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis

Yi-Jian Wu,Simon King,Keiichi Tokuda
DOI: https://doi.org/10.1109/chinsl.2008.ecp.14
2009-01-01
Abstract:This paper explores a cross-lingual speaker adaptation technique for HMM-based speech synthesis, where a source voice model for English is transformed into a target speaker model using Mandarin Chinese speech data from the target speaker. A phone mapping- based method is adopted to map Chinese Initial/Finals into English phonemes and two types of mapping rules, including one-to-one and one-to-sequence mappings, are compared. In order to avoid having to map prosodic features between languages, the adaptation procedure uses regression classes and transforms that are constructed for triphone models, then used to adapt the phonetic-and-prosodic- context-dependent models. From the experimental results, we found that a one-to-sequence phone mapping is better than a one-to-one mapping, and that the similarity between adapted English speech and target Chinese speaker is reasonable.
What problem does this paper attempt to address?