Maximum A Posteriori Based Fusion Method For Speech Emotion Recognition

Ling Cen,Zhu Liang Yu,Wee Ser
DOI: https://doi.org/10.1002/9781118910566.ch9
2015-01-01
Abstract:With the increasing demand for spoken language interfaces in human-computer interactions, automatic recognition of emotional states from human speech has become increasingly important. In our previous work, we have proposed a hybrid scheme that combines the Probabilistic Neural Network (PNN) and the Universal Background Model-Gaussian Mixture Model (UBM-GMM) for speech emotion recognition. In this chapter, we extend the hybrid scheme into a more general Maximum A Posteriori (MAP) based fusion method. The proposed fusion method is capable of effectively combining the strengths of several (two or more) classification methods for recognition of emotional states in speech signals. In order to illustrate the effectiveness of the proposed method, PNN, UBM-GMM, and k-Nearest Neighbor (k-NN) are used as base classifiers in the numerical experiments presented in this chapter. Numerical results show that higher accuracies can be achieved compared with those obtained using the base classifiers alone in the classification of 15 emotional states for the samples extracted from the LDC database. It is also shown from the experiment results that the proposed MAP-based method can work well with a small training dataset.
What problem does this paper attempt to address?