Multimodal PEAR Chain-of-Thought Reasoning for Multimodal Sentiment Analysis

Yan Li,Xiangyuan Lan,Haifeng Chen,Ke Lu,Dongmei Jiang
DOI: https://doi.org/10.1145/3672398
2024-06-11
Abstract:Multimodal sentiment analysis aims to predict sentiments from multimodal signals such as audio, video, and text. Existing methods often rely on Pre-trained Language Models (PLMs) to extract semantic information from textual data, lacking an in-depth understanding of the logical relationships within the text modality . This paper introduces the Multimodal PEAR Chain-of-Thought (MM-PEAR-CoT) reasoning for multimodal sentiment analysis. Inspired by the human thought process when solving complex problems, the PEAR (Preliminaries, quEstion, Answer, Reason) chain-of-thought prompt is first proposed to induce Large Language Models (LLMs) to generate text-based reasoning processes and zero-shot sentiment prediction results. However, text-based chain-of-thought reasoning is not always reliable and might contain irrational steps due to the hallucinations of large language models . To address this, we further design the Cross-Modal Filtering and Fusion (CMFF) module. The filtering submodule utilizes audio and visual modalities to suppress irrational steps in the chain of thought, while the fusion submodule integrates high-level reasoning information and cross-modal complementary information in the process of semantic representation learning. Experimental results on two multimodal sentiment analysis benchmark datasets show that high-level reasoning information can help learn discriminative text representation, and cross-modal complementary information can avoid misleading by unreasonable steps in the chain of thought. MM-PEAR-CoT achieves the best results on both datasets, with improvements of 2.2% and 1.7% in binary classification accuracy on the CMU-MOSI and CMU-MOSEI datasets, respectively. To the best of our knowledge, this is the first study to apply chain-of-thought reasoning to multimodal sentiment analysis.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?