Minimum generation error training with weighted Euclidean distance on LSP for HMM-based speech synthesis

Ming Lei,Zhen-Hua Ling,Li-Rong Dai
DOI: https://doi.org/10.1109/ICASSP.2010.5495688
2010-01-01
ICASSP
Abstract:This paper presents a minimum generation error (MGE) training method using weighted Euclidean distance measure on line spectral pairs (LSP) for HMM-based speech synthesis. In this paper, weighted Euclidean distance on LSP is introduced as the measurement of generation error to improve the consistency between the model training criterion and the subjective perception on the distortion of synthetic speech. Several common weighting techniques are investigated and compared within the MGE training framework. The experimental results show that the formant bounded weighting (FBW) method achieves the best performance, which improves the naturalness of synthetic speech significantly compared with the Euclidean LSP distance measure. Compared with the MGE training using log spectral distortion (LSD) measure, the FBW criterion can achieve similar performance on naturalness with much less computation complexity of model training.
What problem does this paper attempt to address?