Speech Emotion Recognition With I-Vector Feature And Rnn Model

Teng Zhang,Ji Wu
DOI: https://doi.org/10.1109/ChinaSIP.2015.7230458
2015-01-01
Abstract:Machine-based emotion recognition from speech has emerged as an important research area in recent years. However, most studies have been done on artificial data. The difficulty of the recognition task increases when we facing natural speech data such as real-world conversations from call centre. Along with that difficulty, there are some new properties which may be useful to the real-world recognition tasks. In this paper, we focus on the recognition task on real-world conversations. Traditional prosodic acoustic features and the novel i-vector features are introduced and compared to represent the speech signal more abstractly. We also propose a Recurrent Neural Network approach to map the features to emotion labels. With only prosodic acoustic features and SVM multi-clasifier, we obtain a f-measure of 38.3%. By adding the i-vector features and the RNN model, we achieve a better result of 48.9%.
What problem does this paper attempt to address?