Speech emotion recognition based on convolution neural network combined with random forest

Li Zheng,Qiao Li,Hua Ban,Shuhua Liu
DOI: https://doi.org/10.1109/ccdc.2018.8407844
2018-06-01
Abstract:The key to speech emotion recognition is extraction of speech emotion features. In this paper, a new network model (CNN-RF) based on convolution neural network combined with random forest is proposed. Firstly, the convolution neural network is used as the feature extractor to extract the speech emotion feature from the normalized spectrogram, used random forest classification algorithm to classify the speech emotion features. The result of experiment shows that CNN-RF model is superior to the traditional CNN model. Secondly, Improved the Record Sound command box of Nao and applied the CNN-RF model to Nao robot. Finally, Nao robot can “try to figure out” a human's psychology through speech emotion recognition and also know about people's happiness, anger, sadness and joy, achieving a more intelligent human-computer interaction.
What problem does this paper attempt to address?