Automatic Emotion Recognition of Speech Signal in Mandarin

Sheng Zhang,P. C. Ching,Fanrang Kong
DOI: https://doi.org/10.21437/interspeech.2006-500
2006-01-01
Abstract:Traditionally, a simultaneous recognition process using the same feature set of a spoken utterance is used to classify the emotional state of the speaker in addition to its content. However, an analysis on the classification performance for every pair of emotions shows that different features have distinctive classification abilities for different emotions. Therefore, we propose an efficient emotion recognition process called cascade bisection (CB-process), which carries out emotion recognition by means of several bisecting steps and applies different feature sets for every step. This process is based on the features' different abilities of classifying emotions. Through this, we can fully utilize the information extracted from features and achieve a better recognition performance. Five discrete emotional states, namely, neutral, anger, fear, joy, and sadness are distinguished from the input Mandarin speech. After extracting the acoustic features that contain information on short-time energy (amplitude), signal amplitude, and pitch, we derive the representation feature set for further use in the CB-process, which achieves better emotion recognition as demonstrated seen from the experimental results.
What problem does this paper attempt to address?