Duration optimization of speaker adaptation in Mandarin TTS

Yongjin So,Jia Jia,Lianhong Cai
DOI: https://doi.org/10.16511/j.cnki.qhdxxb.2013.11.012
2013-01-01
Abstract:The duration optimization of speaker adaptation in Mandarin TTS SO Yongjin, JIA Jia, CAI Lianhong (Computer Science and Technology Department, Tsinghua University, Beijing 100084, China) Abstract: In Mandarin TTS, the duration of unvoiced and voiced phonemes in a syllable is a very important factor related to the naturalness of synthesized speech. It also is a personalized feature has the great relation with the speaker. This paper proposes an unvoiced/voiced duration optimization approach for the speaker adaptation in HMM-based Mandarin TTS. The relative duration of unvoiced part at a syllable in the corpus of source speaker is clustered with context features. This decision tree is adapted by target speaker using the relative duration of unvoiced part in the adaptation data. In synthesis, a reference relative duration of unvoiced part with the target speaker is generated from this decision tree, and the duration of unvoiced part and voiced part in the synthesized speech is adjusted accordingly. Experiments show that this approach can improve the accuracy of duration prediction in the speaker adaptation of HMM-based Mandarin TTS, and it can effectively improve the similarity of speaker adaptation and the naturalness of synthesized speech.
What problem does this paper attempt to address?