HMM based speech synthesis with Global Variance Training method

Jianhua Tao
DOI: https://doi.org/10.1109/IUCS.2010.5666649
2010-01-01
Abstract:Although Hidden Markov Model based speech synthesis has been proved to have good performance,there are still some factors which degrade the quality of synthesized speech: vocoder,model accuracy and over-smoothing. Experimental results show that over-smoothing in frequency domain mainly affect the quality of synthesized speech whereas over-smoothing in time domain can nearly be ignored. Time domain over-smoothing is generally caused by model structure accuracy problem and frequency domain over-smoothing is caused by training algorithm accuracy problem. ML-estimation based parameter training algorithm causes distortion of perception in speech synthesis. The talk will introduce a Global Variance (FV) based Training method into the HTS training structure. The new method tries to enlarge the variance of the spectrum and FO generation. The experiments show that the method improves the synthesizing performance both in voice quality and the expressiveness.
What problem does this paper attempt to address?