Fine-grained Sentiment Feature Extraction Method for Cross-modal Sentiment Analysis

Ye Sun,Guozhe Jin,Yahui Zhao,Rongyi Cui
DOI: https://doi.org/10.1145/3651671.3651755
2024-01-01
Abstract:It has been found that when people use images to express emotions, emotional information is often only strongly associated with some regions in the image, and these regions are also expressed in the corresponding language in people's comments. Therefore, the relationship between the emotional regions of an image and the relevant text is of great significance for cross-modal sentiment analysis. However, the existing methods simply use the object detection model to extract multiple object regions in the image, without effective screening, or the screening method is too coarse to introduce too much noise. Based on this observation, this paper proposes a novel image-text interactive filtering mechanism to capture the fine-grained features of sentiment, which is used for the screening of fine-grained sentiment regions in cross-modal sentiment analysis. Then, a sentiment consistency learning method is designed to obtain better sentiment feature encoding, so that the model has stronger sentiment classification ability. In addition, considering that the emotional regions extracted by object detection may not necessarily represent complete emotional information, we integrate the contextual feature representation of each individual modality to achieve more reliable prediction. In this paper, we name all the proposed methods Fine-Grained Sentiment Consistency Interaction Network (FSCIN) and achieve good performance improvement on three cross-modal sentiment analysis datasets, which proves the effectiveness of our method.
What problem does this paper attempt to address?