Hierarchical Interactive Multimodal Transformer for Aspect-Based Multimodal Sentiment Analysis

Jianfei Yu,Kai Chen,Rui Xia
DOI: https://doi.org/10.1109/taffc.2022.3171091
IF: 13.99
2022-01-01
IEEE Transactions on Affective Computing
Abstract:Aspect-based multimodal sentiment analysis (ABMSA) aims to determine the sentiment polarities of each aspect or entity mentioned in a multimodal post or review. Previous studies to ABMSA can be summarized into two subtasks: aspect-term based multimodal sentiment classification (ATMSC) and aspect-category based multimodal sentiment classification (ACMSC). However, these existing studies have three shortcomings: (1) ignoring the object-level semantics in images; (2) primarily focusing on aspect-text and aspect-image interactions; (3) failing to consider the semantic gap between text and image representations. To tackle these issues, we propose a general Hierarchical Interactive Multimodal Transformer (HIMT) model for ABMSA. Specifically, we extract salient features with semantic concepts from images via an object detection method, and then propose a hierarchical interaction module to first model the aspect-text and aspect-image interactions, followed by capturing the text-image interactions. Moreover, an auxiliary reconstruction module is devised to largely eliminate the semantic gap between text and image representations. Experimental results show that our HIMT model significantly outperforms state-of-the-art methods on two benchmarks for ATMSC and one benchmark for ACMSC.
computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?