Abstract:The duration optimization of speaker adaptation in Mandarin TTS SO Yongjin, JIA Jia, CAI Lianhong (Computer Science and Technology Department, Tsinghua University, Beijing 100084, China) Abstract: In Mandarin TTS, the duration of unvoiced and voiced phonemes in a syllable is a very important factor related to the naturalness of synthesized speech. It also is a personalized feature has the great relation with the speaker. This paper proposes an unvoiced/voiced duration optimization approach for the speaker adaptation in HMM-based Mandarin TTS. The relative duration of unvoiced part at a syllable in the corpus of source speaker is clustered with context features. This decision tree is adapted by target speaker using the relative duration of unvoiced part in the adaptation data. In synthesis, a reference relative duration of unvoiced part with the target speaker is generated from this decision tree, and the duration of unvoiced part and voiced part in the synthesized speech is adjusted accordingly. Experiments show that this approach can improve the accuracy of duration prediction in the speaker adaptation of HMM-based Mandarin TTS, and it can effectively improve the similarity of speaker adaptation and the naturalness of synthesized speech.

Duration modeling for Chinese synthesis from C-toBI labeled corpus

Using Different Models to Label the Break Indices for Mandarin Speech Synthesis

Duration optimization of speaker adaptation in Mandarin TTS

Comparison of Approaches for Predicting Break Indices in Mandarin Speech Synthesis

The Duration Analysis of the Checked Tones in Cantonese Speech

An unvoiced/voiced duration adjustment algorithm based on context features in mandarin TTS

The Pause Duration Prediction for Mandarin Text-to-speech System

Chinese Prosody Generation Based on C-ToBI Representation for Text-To-Speech

Improving Prosodic Boundaries Prediction For Mandarin Speech Synthesis By Using Enhanced Embedding Feature And Model Fusion Approach

Automatic Prosodic Boundary Labeling Based on Fusing the Silence Duration with the Lexical Features

Unsupervised Prosodic Phrase Boundary Labeling Of Mandarin Speech Synthesis Database Using Context-Dependent Hmm

Duration Model for post-processing in a Mandarin speech recognition system

Prosodic Modeling with Rich Syntactic Context in HMM-based Mandarin Speech Synthesis

Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction.

Acoustic and Linguistic Information Based Chinese Prosodic Boundary Labelling

Automatic Phrase Boundary Labeling for Mandarin TTS Corpus Using Context-Dependent HMM.

Duration Analysis of Quadric-Syllabic Prosodic Words in Putonghua

Chinese Prosodic Phrasing with Extended Features.

A Superposed Prosodic Model for Chinese Text-To-Speech Synthesis

Prosodic Word Boundaries Prediction for Mandarin Text-to-Speech

A Maximum Entropy Based Hierarchical Model for Automatic Prosodic Boundary Labeling in Mandarin