Perceptual Evaluation Weight Training for Text-to-Speech Synthesis

WU Zhiyong,Cai Lianhong,Cai rui
DOI: https://doi.org/10.3321/j.issn:1000-0054.2005.01.014
2005-01-01
Abstract:A weight training approach was developed based on perceptual evaluations for unit selection in concatenative Text-to-Speech (TTS) synthesis. The speech units are selected based on current weights, with the synthesized results evaluated syllable by syllable by a perceptual listening test. The result gives new candidate units with perceptual scores. Then the weights are modified to match the new candidates to provide supervised weight training. Tests demonstrate that the method improves the naturalness of the synthesized result from about 3.49 to 4.02. The method also allows automatic optimization of the TTS system according to the user-behavior feedback.
What problem does this paper attempt to address?