Few-shot Joint Multimodal Aspect-Sentiment Analysis Based on Generative Multimodal Prompt

Xiaocui Yang,Shi Feng,Daling Wang,Sun Qi,Wenfang Wu,Yifei Zhang,Pengfei Hong,Soujanya Poria
2023-05-18
Abstract:We have witnessed the rapid proliferation of multimodal data on numerous social media platforms. Conventional studies typically require massive labeled data to train models for Multimodal Aspect-Based Sentiment Analysis (MABSA). However, collecting and annotating fine-grained multimodal data for MABSA is tough. To alleviate the above issue, we perform three MABSA-related tasks with quite a small number of labeled multimodal samples. We first build diverse and comprehensive multimodal few-shot datasets according to the data distribution. To capture the specific prompt for each aspect term in a few-shot scenario, we propose a novel Generative Multimodal Prompt (GMP) model for MABSA, which includes the Multimodal Encoder module and the N-Stream Decoders module. We further introduce a subtask to predict the number of aspect terms in each instance to construct the multimodal prompt. Extensive experiments on two datasets demonstrate that our approach outperforms strong baselines on two MABSA-related tasks in the few-shot setting.
Multimedia
What problem does this paper attempt to address?
The paper aims to address the challenges of Multimodal Aspect-Based Sentiment Analysis (MABSA) in small sample scenarios. Specifically, the paper focuses on the following aspects: 1. **Data Annotation Challenge**: Traditional multimodal sentiment analysis methods usually require a large amount of annotated data to train models. However, collecting and annotating fine-grained multimodal data in practical applications is very difficult and time-consuming. 2. **Multi-task Processing**: The paper proposes a new Generative Multimodal Prompt (GMP) model to handle three sub-tasks related to MABSA: Multimodal Aspect Term Extraction (MATE), Multimodal Aspect-oriented Sentiment Classification (MASC), and Joint Multimodal Aspect-Sentiment Analysis (JMASA). 3. **Prompt Generation**: To address the unknown number of sentiment terms in each instance, the paper introduces a multi-task learning strategy and constructs a small sample dataset to solve this problem. Through the Multimodal Encoder module and N-Stream Decoders module, prompts specific to each aspect are generated. 4. **Experimental Validation**: Extensive experiments are conducted on two multimodal datasets, and the results show that the proposed model significantly outperforms existing baseline models in small sample settings. In summary, the main goal of this paper is to effectively address the key issues in multimodal sentiment analysis under small sample conditions and propose innovative solutions.