Modeling Pitch Trajectory by Hierarchical HMM with Minimum Generation Error Training.

Yi-Jian Wu,Frank Soong
DOI: https://doi.org/10.1109/icassp.2012.6288799
2012-01-01
Abstract:A hierarchical pitch model (HPM) was recently proposed to HMM-based speech synthesis. In HPM, pitch trajectory is modeled as an additive combination of hierarchical layers (including state, phone, syllable, etc), and a minimum generation error (MGE) criterion is used to re-estimate model parameters. In this paper, we extend the MGE criterion to a tree-based model clustering process to simultaneously cluster the context-dependent models at all layers, and construct a full MGE training process for HPM training. Experiments were conducted to investigate the effects of HPM with different training criteria and different hierarchical layer combinations. Experimental results show that the full MGE training can significantly improve HPM's ability to predict F0 trajectory in TTS over the ML-based approach on test data. The new HPM also outperforms the conventional state-level HMM in F0 prediction.
What problem does this paper attempt to address?