Automatic Emotion Variation Detection in Continuous Speech.

Yuchao Fan,Mingxing Xu,Zhiyong Wu,Lianhong Cai
DOI: https://doi.org/10.1109/apsipa.2014.7041592
2014-01-01
Abstract:Though emotion speech recognition has gained increasing interest in the field of Human Computer Interaction, it is still a challenge to automatically determine the emotion state type and the boundaries of each emotionally salient segment in continuous speech, which is named as Automatic Emotion Variation Detection (AEVD). In this task, the input utterances are not pre-segmented and may contain emotion variations. This paper proposes a Multi-timescaled Sliding Window based AEVD (MSW-AEVD). Firstly, a sliding window with fixed-length is employed to segment continuous speech for classic emotion recognition. An emotion type is assigned to each window-shift according to the recognition results of all the sliding windows containing that window-shift. Then this basic procedure is extended to multi-timescaled sliding window, in which several different features are utilized for different scales. Finally, a post-processing is employed to refine the final outputs. In this work, we focus on anger neutral and happiness-neutral cases, which are mostly dominant in recent studies of AEVD. Performance evaluation is carried out across two databases, including German database EMO-DB and Chinese database TH1309-DB. Experimental results show that the proposed method outperforms HMM-based baseline significantly.
What problem does this paper attempt to address?