Speech emotion recognition in nature and scripted state based on deep learning

Wei Wang,Tingting Hu,Yaqin Feng
DOI: https://doi.org/10.13232/j.cnki.jnju.2019.04.016
2019-01-01
Abstract:Speech is an important way of emotional expression. The emotional information is not the same under the speech state of nature and scripted. In order to explore the difference of speech emotion recognition under the nature and the scripted state, the deep learning algorithm is used to analysis IEMOCAP public datasets. Four types of emotions, such as neutral,anger,happy and sad,are analyzed in the following experiments. Firstly,acoustic features are extracted (compared in the emobase2010 and eGeMAPs features sets). Then,Convolution Neural network (CNN) was carried out to recognize speech emotion in the nature and scripted state,respectively. Finally,confusion matrix is used to analyze the difference of the recognition accuracy of two states in every emotions. Results show that the emotion recognition accuracy in natural state was significantly higher than the one in the scripted state. There was also significant difference in the two states for angry and sad emotions. The results would be helpful for understanding the mechanism of emotional expression.
What problem does this paper attempt to address?