Objective Distance Measures for Assessing Concatenative Speech Synthesis.

Jing-Dong Chen,Nick Campbell
DOI: https://doi.org/10.21437/eurospeech.1999-157
1999-01-01
Abstract:Several di(cid:11)erent acoustic transforms of the speech signal are compared for use in the assessment and evaluation of concatenative speech synthesis. The transforms tested include LPC, LSP, MFCC, bispectrum, Mellin transform of the log spectrum, Wigner-Ville distribution (WVD), etc. The computed distances between a synthesised utterance and a natu-rally spoken version of the same sentence are compared by correlation with perceptually-based scores obtained from a MOS evaluation. The results show that the distances computed using the bispectrum have the highest degree of correlation with the MOS score. Both the RMFCC and the LPC outperform the MFCC and the LPCC. The WVD-based cepstrum is found to behave poorly in this task.
What problem does this paper attempt to address?