Abstract:To fulfill the explosion of multi-modal data, multi-modal sentiment analysis (MSA) emerged and attracted widespread attention. Unfortunately, conventional multi-modal research relies on large-scale datasets. On the one hand, collecting and annotating large-scale datasets is challenging and resource-intensive. On the other hand, the training on large-scale datasets also increases the research cost. However, the few-shot MSA (FMSA), which is proposed recently, requires only few samples for training. Therefore, in comparison, it is more practical and realistic. There have been approaches to investigating the prompt-based method in the field of FMSA, but they have not sufficiently considered or leveraged the information specificity of visual modality. Thus, we propose a vision-enhanced prompt-based model based on graph structure to better utilize vision information for fusion and collaboration in encoding and optimizing prompt representations. Specifically, we first design an aggregation-based multi-modal attention module. Then, based on this module and the biaffine attention, we construct a syntax–semantic dual-channel graph convolutional network to optimize the encoding of learnable prompts by understanding the vision-enhanced information in semantic and syntactic knowledge. Finally, we propose a collaboration-based optimization module based on the collaborative attention mechanism, which employs visual information to collaboratively optimize prompt representations. Extensive experiments conducted on both coarse-grained and fine-grained MSA datasets have demonstrated that our model significantly outperforms the baseline models.

Unified Multi-modal Pre-training for Few-shot Sentiment Analysis with Prompt-based Learning

Few-Shot Multi-Modal Sentiment Analysis with Prompt-Based Vision-Aware Language Modeling

Attention-optimized vision-enhanced prompt learning for few-shot multi-modal sentiment analysis

Syntax-aware Hybrid prompt model for Few-shot multi-modal sentiment analysis

Mixture-of-Prompt-Experts for Multi-modal Semantic Understanding

Few-shot Joint Multimodal Aspect-Sentiment Analysis Based on Generative Multimodal Prompt

Few-shot Multimodal Sentiment Analysis based on Multimodal Probabilistic Fusion Prompts

Prompt and Contrastive Learning for Few-shot Sentiment Classification

MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Towards Unified Prompt Tuning for Few-shot Text Classification

Sentiment-aware Multimodal Pre-Training for Multimodal Sentiment Analysis

Multi-Task Pre-Training of Modular Prompt for Few-Shot Learning

Multitask Pre-training of Modular Prompt for Chinese Few-Shot Learning

Adaptive Prompt Learning-Based Few-Shot Sentiment Analysis

VLP2MSA: Expanding Vision-Language Pre-Training to Multimodal Sentiment Analysis

Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

A Soft Contrastive Learning-based Prompt Model for Few-shot Sentiment Analysis

Unified Prompt Learning Makes Pre-Trained Language Models Better Few-Shot Learners

Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning

UP-DP: Unsupervised Prompt Learning for Data Pre-Selection with Vision-Language Models