The Multi-modal Emotion Recognition Based on Text and Image

Wenlong Li,K. Hirota,Xingwang Liu,Yaping Dai,Zhiyang Jia
2020-01-01
Abstract:The Multi-modal emotion recognition based on text and image (MMER) is proposed to solve the problem of inaccurate emotion recognition and poor model robustness of a single modality such as text, image or speech. The Multi-modal emotion recognition based on text and image compares the shallow features of text and image by cosine similarity, and inputs the obtained results to the decision-making layer, and participates in the final emotional decision-making together with the respective results of text and image. The experimental data set is made by ourselves, and each row includes an image, a sentence of text and the emotional label. Results of experiments on the dataset show that the Macro-F1 score for the multimodal model based on text and image is 73.54, achieving 6.4% and 11.8% improvement compared with the text emotion recognition model various LSTM and the image emotion recognition model ResNet.
What problem does this paper attempt to address?