Speech Emotion Recognition Based on Hyper-Prosodic Features

Bicheng Jin,Gang Liu
DOI: https://doi.org/10.1109/icctec.2017.00027
2017-01-01
Abstract:Speech emotion recognition is well used in a wide range of applications, and has been a research hotspot in academic field. However, due to some restricted conditions, for instance, the features, the current recognition effect is far away from practical application. This paper proposes a viewpoint that the speech emotion is well performed by the long-time changes of prosody. Based on this, a feature extraction method, extraction of hyper-prosodic features (EHPF), is proposed. The raw signal was processed by down-sampling through the contour established by prosodic features such as fundamental frequency. From the feature contour, a lot of statistical features can be extracted, and the emotion information can be contained in the features set as much as possible. Then the highly related features are chosen by removing the redundancy, which is called the feature selection. In the paper, contrast experiments are conducted on different public databases by various classifiers, such as SVM, GBDT, random forest, DNN and others. The results obtained in the experiments are closed to, even beyond the state-of-the-art on multiple databases. It demonstrates that this method achieves better performance and tends to validate the conclusion put forward in this paper.
What problem does this paper attempt to address?