Multi-Modal Emotion Recognition Based On deep Learning Of EEG And Audio Signals

Zhongjie Li,Gaoyan Zhang,Jianwu Dang,Longbiao Wang,Jianguo Wei
DOI: https://doi.org/10.1109/IJCNN52387.2021.9533663
2021-01-01
Abstract:Automatic recognition of human emotional states has attracted many researchers' attention in Human-Computer Interactions and emotional brain-computer interface recently. However, the accuracy of emotion recognition is not satisfying. Considering the advantage of information supplement based on deep learning of multi-modal signals related to emotion, this study proposed a novel emotion recognition architecture to fuse emotional features from brain electroencephalography (EEG) signal and the corresponding audio signal in emotion recognition on DEAP dataset. We used convolutional neural network (CNN) to extract EEG features and bidirectional long short term memory (BiLSTM) neural networks to extract audio features. After that, we combine the multi-modal features into a deep learning architecture to recognize arousal and valence levels. Results showed an improved accuracy compared with previous studies that merely used the EEG signals in both arousal level and valence level, which suggests the effectiveness of our proposed multi-modal fused emotion recognition model. In future work, multi-modal data from nature interaction scenes will be collected and inputted into this architecture to further validate the effectiveness of the method.
What problem does this paper attempt to address?