Abstract:This paper presents a decision tree pruning method for the model clustering of HMM-based parametric speech synthesis by cross-validation (CV) under the minimum generation error (MGE) criterion. Decision-tree-based model clustering is an important component in the training process of an HMM based speech synthesis system. Conventionally, the maximum likelihood (ML) criterion is employed to choose the optimal contextual question from the question set for each tree node split and the minimum description length (MDL) principle is introduced as the stopping criterion to prevent building overly large tree models. Nevertheless, the MDL criterion is derived based on an asymptotic assumption and is problematic in theory when the size of the training data set is not large enough. Besides, inconsistency exists between the MDL criterion and the aim of speech synthesis. Therefore, a minimum cross generation error (MCGE) based decision tree pruning method for HMM-based speech synthesis is proposed in this paper. The initial decision tree is trained by MDL clustering with a factor estimated using the MCGE criterion by cross-validation. Then the decision tree size is tuned by backing-off or splitting each leaf node iteratively to minimize a cross generation error, which is defined to present the sum of generation errors calculated for all training sentences using cross-validation. Objective and subjective evaluation results show that the proposed method outperforms the conventional MDL-based model clustering method significantly.

Minimum Unit Selection Error Training for HMM-based Unit Selection Speech Synthesis System

Trainable Unit Selection Speech Synthesis under Statistical Framework

HMM-based Unit Selection Using F

Minimum Generation Error Training for HMM-Based Speech Synthesis

HMM-Based Hierarchical Unit Selection Combining Kullback-Leibler Divergence with Likelihood Criterion

HMM-based Unit Selection Using Frame Sized Speech Segments.

Statistical Acoustic Model Based Unit Selection Algorithm for Speech Synthesis

HMM-BASED HIERARCHICALUNITSELECTIONCOMBINING KULLBACK-LEIBLER DIVERGENCE WITH LIKELIHOODCRITERION

Optimization Method for Unit Selection Speech Synthesis Based on Synthesis Quality Predictions

HMM-based Unit Selection Speech Synthesis Using Log Likelihood Ratios Derived from Perceptual Data

HMM training method based on evolutionary computation and MDI in speech recognition

Unit Selection Speech Synthesis Integrating Automatic Error Detection

Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score

Statistical modeling of syllable-level F0 features for HMM-based unit selection speech synthesis

Minimum Generation Error Training for HMM-based Prediction of Articulatory Movements

Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis

Model Adaptation for HMM-Based Speech Synthesis under Minimum Generation Error Criterion

Full HMM Training for Minimizing Generation Error in Synthesis

Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis

Minimum Generation Error Training With Direct Log Spectral Distortion On Lsps For Hmm-Based Speech Synthesis

Cross-Validation and Minimum Generation Error Based Decision Tree Pruning for HMM-based Speech Synthesis