Weakly Correlated Multimodal Sentiment Analysis: New Dataset and Topic-oriented Model

Wuchao Liu,Wengen Li,Yu-Ping Ruan,Yulou Shu,Juntao Chen,Yina Li,Caili Yu,Yichao Zhang,Jihong Guan,Shuigeng Zhou
DOI: https://doi.org/10.1109/taffc.2024.3396144
IF: 13.99
2024-01-01
IEEE Transactions on Affective Computing
Abstract:Existing multimodal sentiment analysis models focus more on fusing highly correlated image-text pairs, and thus achieves unsatisfactory performance on multimodal social media data which usually manifests weak correlations between different modalities. To address this issue, we first build a large multimodal social media sentiment analysis dataset RU-Senti which contains more than 100,000 image-text pairs with sentiment labels. Then, we proposed a topic-oriented model (TOM) which assumes that text is usually related to a certain portion of the image contents and the image-text pairs of the same topic often have similar sentiment tendencies. TOM learns the topic information from textual content and designs a topic-oriented feature alignment module to extract textual semantics correlated information from images, thus achieving the alignment between two modalities. Then, TOM utilizes a transformer encoder initialized with the parameters from a pre-trained vision-language model to fuse the multimodal features for sentiment prediction. According to the experiments over the public MVSA-Multiple dataset and our RU-Senti dataset, RU-Senti is of high suitability for studying weakly correlated multimodal sentiment analysis, and the proposed TOM model also largely outperforms the SOTA mulitimodal sentiment analysis methods and pre-trained vision-language models. The RU-Senti dataset and the code of TOM are available at https://github.com/PhenoixYANG/TOM.
What problem does this paper attempt to address?