Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2009

Keiichiro Our,Yi-Jian Wu,Keiichi Tokuda
2009-01-01
Abstract:We describe a hidden Markov model (HMM)-based speech synthesis system developed at the Nagoya Institute of Technol- ogy (NIT) for Blizzard Challenge 2009. We incorporated sev- eral state-of-the-art technologies into this system, including the Speech Transformation and Representation using Adaptive In- terpolation of weiGHTed spectrum (STRAIGHT) vocoder, min- imum generation error (MGE) training, phone duration mod- eling, parameter generation algorithm considering global vari- ance, and linear spectrum pair (LSP)-based formant enhance- ment. The runtime of system synthesizes speech around 0.3 xRT (real time ratio), and its footprint is less than 25 MB. The results of listening tests showed that the overall speech qual- ity and intelligibility of our systems are better than most other systems, especially when we have better labeling for a speech corpus.
What problem does this paper attempt to address?