Abstract:Multimodal sentiment analysis, which aims to recognize the emotions expressed in multimodal data, has attracted extensive attention in both academia and industry. However, most of the current studies on user-generated reviews classify the overall sentiments of reviews and hardly consider the aspects of user expression. In addition, user-generated reviews on social media are usually dominated by short texts expressing opinions, sometimes attached with images to complement or enhance the emotion. Based on this observation, we propose a visual enhancement capsule network (VECapsNet) based on multimodal fusion for the task of aspect-based sentiment analysis. Firstly, an adaptive mask memory capsule network is designed to extract the local clustering information from opinion text. Then, an aspect-guided visual attention mechanism is constructed to obtain the image information related to the aspect phrases. Finally, a multimodal fusion module based on interactive learning is presented for multimodal sentiment classification, which takes the aspect phrases as the query vectors to continuously capture the multimodal features correlated to the affective entities in multi-round iterative learning. Otherwise, due to the limited number of multimodal aspect-based sentiment review datasets at present, we build a large-scale multimodal aspect-based sentiment dataset of Chinese restaurant reviews, called MTCom. The extensive experiments both on the single-modal and multimodal datasets demonstrate that our model can better capture the local aspect-based sentiment features and is more applicable for general multimodal user reviews than existing methods. The experimental results verify the effectiveness of our proposed VECapsNet.

MASANet: Multi-Aspect Semantic Auxiliary Network for Visual Sentiment Analysis

MSFNet: modality smoothing fusion network for multimodal aspect-based sentiment analysis

ModalNet: an aspect-level sentiment classification model by exploring multimodal data with fusion discriminant attentional network

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection.

Cross-modal Enhancement Network for Multimodal Sentiment Analysis

Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis

MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis

Multi-level textual-visual alignment and fusion network for multimodal aspect-based sentiment analysis

Targeted Aspect-Based Multimodal Sentiment Analysis: An Attention Capsule Extraction and Multi-Head Fusion Network

Deep Modular Co-Attention Shifting Network for Multimodal Sentiment Analysis.

Multi-selection Attention for Multimodal Aspect-level Sentiment Classification

Interactive Fusion Network with Recurrent Attention for Multimodal Aspect-based Sentiment Analysis.

Joint Multimodal Aspect Sentiment Analysis with Aspect Enhancement and Syntactic Adaptive Learning

Multi-Grained Fusion Network with Self-Distillation for Aspect-Based Multimodal Sentiment Analysis

An Interactive Attention Mechanism Fusion Network for Aspect-Based Multimodal Sentiment Analysis

Cross-modal fine-grained alignment and fusion network for multimodal aspect-based sentiment analysis

TCAN: Text-oriented Cross Attention Network for Multimodal Sentiment Analysis

Visual Enhancement Capsule Network for Aspect-based Multimodal Sentiment Analysis

Aspect-aware Semantic Feature Enhanced Networks for Multimodal Aspect-Based Sentiment Analysis

Sentiment Analysis of Social Media Comments Based on Multimodal Attention Fusion Network