Fusion Model for Speech Emotion Recognition with Low Level Descriptor Features

Cheng Chang,Huifeng Zhang,Zhangxuan Gu,Yanmin Qian
2017-01-01
Abstract:Speech emotion recognition is one of the most challenging speech processing tasks, with many applications in the field of Human-Machine Interaction (HMI). Traditional works have been using Gaussian Mixture Models (GMM) for classification. In recent researches, Deep Neural Networks (DNN) have shown strong ability in feature learning and modeling in many tasks. In this paper, we present to utilize the DNN to learn extra features from Low Level Descriptor (LLD) features for other classifiers, and propose a decision-level fusion model for speech emotion recognition. We carry out experiments and evaluations on a public German corpus called FAU-AIBO with five emotions. And our experimental results using LLD features demonstrate that our proposed approach improves the recognition performance on unweighted results and outperforms the baseline using GMM significantly.
What problem does this paper attempt to address?