Gated attention fusion network for multimodal sentiment classification

Yongping Du,Yang Liu,Zhi Peng,Xingnan Jin
DOI: https://doi.org/10.1016/j.knosys.2021.108107
2022-03-01
Abstract:Sentiment classification can explore the opinions expressed by people and help them make better decisions. With the increasing of multimodal contents on the web, such as text, image, audio and video, how to make full use of them is important in many tasks, including sentiment classification. This paper focuses on the text and image. Previous work cannot capture the fine-grained features of images, and those models bring a lot of noise during feature fusion. In this work, we propose a novel multimodal sentiment classification model based on gated attention mechanism. The image feature is used to emphasize the text segment by the attention mechanism and it allows the model to focus on the text that affects the sentiment polarity. Moreover, the gating mechanism enables the model to retain useful image information while ignoring the noise introduced during the fusion of image and text. The experiment results on Yelp multimodal dataset show that our model outperforms the previous SOTA model. And the ablation experiment results further prove the effectiveness of different strategies in the proposed model.
computer science, artificial intelligence
What problem does this paper attempt to address?