Global Variance Modeling on the Log Power Spectrum of LSPs for HMM-based Speech Synthesis

Zhen-Hua Ling,Yu Hu,Li-Rong Dai
DOI: https://doi.org/10.1109/iscslp.2014.6936612
2012-01-01
Abstract:This paper utilizes global variance (GV) of the log power spectrum (LPS) derived from mel-cepstrum to improve hidden Markov model (HMM) based parametric speech synthesis. In order to alleviate over-smoothing of the generated spectral structures, an LPS-GV modeling method using line spectral pairs (LSPs) has been proposed in our previous work, where the estimated distribution of LPS-GV was combined with the trained acoustic model to determine the optimal spectral features at synthesis time. In this paper, we extend this method to the condition where mel-cepstral coefficients are used as spectral features. Further, a method of integrating LPS-GV distortions into the criterion of minimum generation error (MGE) model training is proposed in order to avoid high computational complexity of the parameter generation algorithm with GV model. Experimental results show that the parameter generation algorithm using LPS-GV model produces more natural acoustic features than the conventional GV modeling method when mel-cepstrum features are adopted. Besides, integrating LPS-GV distortions into model training criterion achieves similar performance as applying LPS-GV model at synthesis time.
What problem does this paper attempt to address?