Image and Text Aspect Level Multimodal Sentiment Classification Model Using Transformer and Multilayer Attention Interaction

Xiuye Yin,Liyong Chen
DOI: https://doi.org/10.4018/ijdwm.333854
2023-11-15
International Journal of Data Warehousing and Mining
Abstract:Many existing image and text sentiment analysis methods only consider the interaction between image and text modalities, while ignoring the inconsistency and correlation of image and text data, to address this issue, an image and text aspect level multimodal sentiment analysis model using transformer and multi-layer attention interaction is proposed. Firstly, ResNet50 is used to extract image features, and RoBERTa-BiLSTM is used to extract text and aspect level features. Then, through the aspect direct interaction mechanism and deep attention interaction mechanism, multi-level fusion of aspect information and graphic information is carried out to remove text and images unrelated to the given aspect. The emotional representations of text data, image data, and aspect type sentiments are concatenated, fused, and fully connected. Finally, the designed sentiment classifier is used to achieve sentiment analysis in terms of images and texts. This effectively has improved the performance of sentiment discrimination in terms of graphics and text.
computer science, software engineering
What problem does this paper attempt to address?