Improving High Quality TTS using Circular Linear Prediction and Constant Pitch Transform

S. Shukla,T. P. Barnwell,T.P. Barnwell
DOI: https://doi.org/10.1109/icassp.2007.367004
2007-04-01
Abstract:Current high quality concatenative TTS systems are based on unit selection from a database that is contextually and prosodically rich. These systems are computationally expensive and require a very large footprint. This paper presents a new method for representing speech segments that can improve the quality and scalability of concatenative TTS systems. The circular linear prediction model combined with the constant pitch transform provides a robust representation of speech signals that allows for limited prosodic movements without perceivable loss in quality. A method is presented for constraining the LSF tracks of speech segments to realize pitch modifications with minimal artifacts. The results of formal listening tests demonstrate that limited prosodic modifications can produce speech from fewer units whose quality equals or exceeds large database unit-selection systems. Additionally, this method is used to realize high quality emphasized speech.
What problem does this paper attempt to address?