Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

Zirun Guo,Tao Jin,Zhou Zhao
2024-07-07
Abstract:The development of multimodal models has significantly advanced multimodal sentiment analysis and emotion recognition. However, in real-world applications, the presence of various missing modality cases often leads to a degradation in the model's performance. In this work, we propose a novel multimodal Transformer framework using prompt learning to address the issue of missing modalities. Our method introduces three types of prompts: generative prompts, missing-signal prompts, and missing-type prompts. These prompts enable the generation of missing modality features and facilitate the learning of intra- and inter-modality information. Through prompt learning, we achieve a substantial reduction in the number of trainable parameters. Our proposed method outperforms other methods significantly across all evaluation metrics. Extensive experiments and ablation studies are conducted to demonstrate the effectiveness and robustness of our method, showcasing its ability to effectively handle missing modalities.
Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily aims to address the issue of missing modalities in multimodal sentiment analysis and emotion recognition. Specifically, the paper proposes a novel multimodal Transformer framework that utilizes prompt learning to handle the situation of missing modalities. #### The main contributions are as follows: 1. **Proposed a new framework**: This framework uses prompt learning to address the problem of missing modalities in sentiment analysis and emotion recognition tasks. This method is not only computationally efficient but also effectively handles missing modalities during both training and testing phases. 2. **Parameter count is linearly related to the number of modalities**: The proposed three types of prompts (generation prompt, missing signal prompt, and missing type prompt) have a quantity that is linearly related to the number of modalities, significantly reducing the demand for computational resources. 3. **Proposed three types of prompts**: These prompts can generate missing information and learn both intra-modal and inter-modal information respectively. 4. **Performance on multiple datasets**: The proposed model significantly outperforms baseline methods on all evaluation metrics. Additionally, the study found that applying a 70% modality dropout rate during training can optimally enhance model performance. The effectiveness and robustness of this method have been validated through extensive experiments.