Multimodal Sentiment Analysis based on Supervised Contrastive Learning and Cross-modal Translation under Modalities Missing * .

Yuqing Miao,Gaowei Zhu,Tonglai Liu,Wanzhen Zhang,Yimin Wen,Ming Zhou
DOI: https://doi.org/10.1109/PAAP60200.2023.10391354
2023-01-01
Abstract:Multimodal sentiment analysis tasks have been extensively researched in recent years. However, real-world multimodal data collection is often problematic due to missing modalities, making it suboptimal. To address these issues, this paper presents a supervised contrast learning and cross-modal translation-based multimodal sentiment analysis model that can operate even when modalities are missing. First, supervised contrast learning is employed to carry out cluster analysis on the same sample batch and establish the connection between different samples. Incomplete features of the target samples are then supplemented with the other samples. Following that, a cross-modal translation network, based on Transformer, is utilised to learn the collective representation of different modalities. A feature reconstruction network is used, employing center distance difference loss, to reduce the difference between features obtained by the model under modality-missing and modality-complete conditions. This aims to enhance the robustness of feature extraction. Experimental findings from two publicly available datasets demonstrate the model’s improved performance for varying levels of modalities missing.
What problem does this paper attempt to address?