Detection and Emphatic Realization of Contrastive Word Pairs for Expressive Text-to-speech Synthesis

Chunrong Li,Zhiyong Wu,Fanbo Meng,Helen Meng,Lianhong Cai
DOI: https://doi.org/10.1109/iscslp.2012.6423493
2012-01-01
Abstract:This paper addresses the problem of automatic detection of contrastive word pairs and their acoustic realization in emphasis for expressive text-to-speech (TTS) synthesis in English. Support vector machines (SVMs) have been used to automatically detect contrastive word pairs from lexical features, syntactic dependencies and semantic relations. A much better performance is achieved by adding accent ratio and word identity features. Hidden Markov model (HMM) based speech synthesis is then used to generate emphatic speeches by putting emphasis on the detected contrastive word pairs. Subjective experiments show that most of the listeners consider putting emphasis on contrastive word pairs is more acceptable than on non-contrastive word pairs. This indicates the importance of the accurate detection of contrastive word pairs.
What problem does this paper attempt to address?