Multimodal sentiment analysis based on cross-instance graph neural networks

Hongbin Wang,Chun Ren,Zhengtao Yu
DOI: https://doi.org/10.1007/s10489-024-05309-0
IF: 5.3
2024-02-22
Applied Intelligence
Abstract:Owing to the diversity of social media information, enhancing the accuracy of sentiment analysis for social media necessitates a comprehensive understanding of text and image information. Observing multimodal posts across social media platforms reveals similarities within images or texts containing posts, such as the association of rainbows with positive posts and darkness with negative posts. However, most previous studies have only modeled individual image-text pairs and ignored the global co-occurrence characteristics of the dataset. To address this problem, we propose a cross-instance graph neural network that leverages the global characteristics of the dataset to detect sentiments in text-image pairs. First, the method extracts five attributes from each image, constructs a co-occurrence matrix using the co-occurrence relationships between the attributes, and generates an attribute-graph convolutional network (Attribute_GCN). For the text modality, words are used as nodes, and when two words occur together more than twice, an edge is created between them. Then, point-wise mutual information and message-passing mechanisms are utilized to update the representations of the edges and nodes, resulting in the construction of Text_GNN. The hidden representations of the multimodal are obtained by encoding. Finally, a multimodal in-deep fusion with the multihead attention mechanism is implemented to better predict the sentiment of image-text pairs. We conducted extensive experiments using three public multimodal datasets, and the experimental results validated the availability of the proposed method.
computer science, artificial intelligence
What problem does this paper attempt to address?