Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality

Haozhe Chi,Minghua Yang,Junhao Zhu,Guanhong Wang,Gaoang Wang
DOI: https://doi.org/10.48550/arXiv.2210.03428
2022-10-07
Abstract:Multimodal sentiment analysis (MSA) is an important way of observing mental activities with the help of data captured from multiple modalities. However, due to the recording or transmission error, some modalities may include incomplete data. Most existing works that address missing modalities usually assume a particular modality is completely missing and seldom consider a mixture of missing across multiple modalities. In this paper, we propose a simple yet effective meta-sampling approach for multimodal sentiment analysis with missing modalities, namely Missing Modality-based Meta Sampling (M3S). To be specific, M3S formulates a missing modality sampling strategy into the modal agnostic meta-learning (MAML) framework. M3S can be treated as an efficient add-on training component on existing models and significantly improve their performances on multimodal data with a mixture of missing modalities. We conduct experiments on IEMOCAP, SIMS and CMU-MOSI datasets, and superior performance is achieved compared with recent state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively handle missing modalities when the input data contains partially missing modalities in multimodal sentiment analysis, in order to improve the performance of the model. Specifically, existing methods usually assume that a particular modality is completely missing and rarely consider the situation of partially missing in multiple modalities. In this case, existing methods may perform poorly because they are difficult to adapt to data with different missing rates. Therefore, this paper proposes a simple and effective meta - sampling method (Meta Sampling, M3S) to deal with the problem of partially missing modalities in multimodal sentiment analysis. ### Main contributions of the paper: 1. **Proposed a simple and effective meta - training framework**: This framework can handle the problem of mixed partially missing modalities in multimodal sentiment analysis tasks. 2. **M3S can be used as an efficient additional training component for existing models**: Significantly improve the performance of the model when dealing with missing - modality data. 3. **Conducted comprehensive experiments on widely - used datasets**: Including IEMOCAP, SIMS and CMU - MOSI datasets, and the results show that the performance of M3S on these datasets is better than that of the recent state - of - the - art methods. ### Method overview: - **Problem description**: The goal of multimodal sentiment analysis is to predict sentiment labels \(Y\) based on multimodal data \(X\), where \(X=(A, V, L)\) represent audio, video and text data respectively. This paper focuses on solving the problem that each modality may contain missing data. - **Enhanced missing - modality transformation**: Given a sample \(X_i=(A_i, V_i, L_i)\), generate a sample with randomly missing data through the enhanced transformation \(T(X_i; F)\). For each modality \(m\in\{a, v, l\}\), define a missing rate \(r_m\in[0, 1]\) and replace values within a specific range of encoded features. - **Meta - sampling training**: Use the MAML (Model - Agnostic Meta - Learning) framework for training. In each iteration, sample two independent batches of data \(\tilde{X}_1\) and \(\tilde{X}_2\) from the dataset for the support set and the task of the support set respectively. Update the model parameters through the inner loop and the outer loop to adapt to data with different missing rates. ### Experimental results: - **Main results**: On the IEMOCAP, SIMS and CMU - MOSI datasets, M3S outperforms the original baseline methods on almost all evaluation metrics, especially in the case of medium missing rates. - **Research under different missing rates**: The experimental results show that M3S can effectively improve the model performance under different missing rates, especially when the missing rate is medium. - **Convergence comparison**: M3S helps the model converge to a lower loss value faster and finally achieve higher performance. - **Adaptability to different missing rates**: M3S can still significantly improve the model performance when the test data and the input data have different missing rates. ### Conclusion: The M3S method proposed in this paper performs well in dealing with the problem of partially missing modalities in multimodal sentiment analysis and can significantly improve the performance of existing models, especially in the case of medium missing rates. Future work will explore how to better combine other training methods and extend to other multimodal learning tasks.