Abstract:Owing to the diversity of social media information, enhancing the accuracy of sentiment analysis for social media necessitates a comprehensive understanding of text and image information. Observing multimodal posts across social media platforms reveals similarities within images or texts containing posts, such as the association of rainbows with positive posts and darkness with negative posts. However, most previous studies have only modeled individual image-text pairs and ignored the global co-occurrence characteristics of the dataset. To address this problem, we propose a cross-instance graph neural network that leverages the global characteristics of the dataset to detect sentiments in text-image pairs. First, the method extracts five attributes from each image, constructs a co-occurrence matrix using the co-occurrence relationships between the attributes, and generates an attribute-graph convolutional network (Attribute_GCN). For the text modality, words are used as nodes, and when two words occur together more than twice, an edge is created between them. Then, point-wise mutual information and message-passing mechanisms are utilized to update the representations of the edges and nodes, resulting in the construction of Text_GNN. The hidden representations of the multimodal are obtained by encoding. Finally, a multimodal in-deep fusion with the multihead attention mechanism is implemented to better predict the sentiment of image-text pairs. We conducted extensive experiments using three public multimodal datasets, and the experimental results validated the availability of the proposed method.

DGFN Multimodal Emotion Analysis Model Based on Dynamic Graph Fusion Network

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

GraphMFT: A Graph Network based Multimodal Fusion Technique for Emotion Recognition in Conversation

MM-DFN: Multimodal Dynamic Fusion Network for Emotion Recognition in Conversations

Multimodal sentiment analysis based on cross-instance graph neural networks

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

MLGAT: multi-layer graph attention networks for multimodal emotion recognition in conversations

Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis

Graph convolutional network with interactive memory fusion for aspect-based sentiment analysis

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

BAFN: Bi-Direction Attention Based Fusion Network for Multimodal Sentiment Analysis

Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis

Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition

GA2MIF: Graph and Attention Based Two-Stage Multi-Source Information Fusion for Conversational Emotion Detection

Aspect-Level sentiment analysis based on fusion graph double convolutional neural networks

Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Multimodal Sentiment Analysis of Graphic Texts Based on Multicategorical Relative Fusion

GraphCFC: A Directed Graph Based Cross-Modal Feature Complementation Approach for Multimodal Conversational Emotion Recognition