Multimodal sentiment analysis with BERT-ResNet50

Senchang Zhang,Yue He,Lei Li,Yaowen Dou
DOI: https://doi.org/10.1117/12.2679113
2023-05-08
Abstract:Aiming at the problem that the information difference between modalities in the current multimodal sentiment analysis model and the insufficient fusion between modalities lead to the low accuracy of network prediction, this paper designs a multimodal sentiment analysis model based on BERT-ResNet50. The model uses BERT and ResNet50 to extract text and image features respectively, fuses multi-modal information through the encoder layer of Transformer, and finally uses the Softmax layer to classify multi-modal information. The dataset used in this paper is the Twitter sarcasm public dataset. Through experiments, the BERT-ResNet50 model proposed in this paper is higher than the comparison models in accuracy, recall rate and F1 value, and the accuracy reaches 74.05%. Ablation experiments show that the accuracy of the model in multi-modal sentiment analysis is higher than that in single-modal sentiment analysis.
Engineering
What problem does this paper attempt to address?