Objective Evaluation Methods for Chinese Text-To-Speech Systems

Teng Zhang,Zhipeng Chen,Ji Wu,Sam Lail,Wenhui Lei,Carsten Isert
DOI: https://doi.org/10.21437/interspeech.2016-421
2016-01-01
Abstract:To objectively evaluate the performance of text-to-speech (TTS) systems, many studies have been conducted in the straightforward way to compare synthesized speech and natural speech with the alignment. However, in most situations, there is no natural speech can be used. In this paper, we focus on machine learning approaches for the TTS evaluation. We exploit a subspace decomposition method to separate different components in speech, which generates distinctive acoustic features automatically. Furthermore, a pairwise based Support Vector Machine (SVM) model is used to evaluate TTS systems. With the original prosodic acoustic features and Support Vector Regression model, we obtain a ranking relevance of 0.7709. Meanwhile, with the proposed oblique matrix projection method and pairwise SVM model, we achieve a much better result of 0.9115.
What problem does this paper attempt to address?