Speech Emotion Classification with the Combination of Statistic Features and Temporal Features.

DN Jiang,LH Cai
DOI: https://doi.org/10.1109/icme.2004.1394647
2004-01-01
Abstract:For classifying speech emotion, most previous systems used either statistical features or temporal features exclusively. However, these two distinct feature representations appear to be concerned with different aspects of emotion, and should be combined in the task. This work proposes a classification scheme that enables the combination of them both. In the scheme, GMM and HMM are first performed to model the statistical features and temporal features respectively. Then the GMM likelihoods and HMM likelihoods are used as features in a further procedure. Finally, a weighted Bayesian classifier and MLP are applied to accomplish the classification. Experiments on a Chinese speech corpus have demonstrated that the scheme could improve the classification accuracy greatly. More detailed analysis indicated that these two feature representations could compensate each other efficiently in the classification.
What problem does this paper attempt to address?