Automatic detection of contrastive word pairs using textual and acoustic features

Zang Xiao,Wu Zhiyong,Ning Yishuang,Meng Helen,Cai Lianhong
DOI: https://doi.org/10.1109/ICOSP.2014.7015073
IF: 4.729
2014-01-01
Signal Processing
Abstract:Labeling emphatic words from speech recordings plays an important role in building speech corpus for expressive speech synthesis. People generally pronounce some words stronger than usual, making the speech more expressive and signaling the focus of the sentence. Contrastive word pairs are often pronounced with stronger prominences and their presence modifies the meaning of the utterance in subtle but important ways. We used a subset of Switchboard corpus to study the acoustic characteristics of contrastive word pairs and the differences between contrastive and non-contrastive words. To address the problem of automatic detection of contrastive word pairs, support vector machines (SVMs) are used to automatically detect contrastive word pairs. We report the results for automatic detection of contrastive word pairs based on textual and acoustic features. By adding acoustic features, a much better performance is achieved.
What problem does this paper attempt to address?