Multi-level Contrastive Learning: Hierarchical Alleviation of Heterogeneity in Multimodal Sentiment Analysis

Cunhang Fan,Kang Zhu,Jianhua Tao,Guofeng Yi,Jun Xue,Zhao Lv
DOI: https://doi.org/10.1109/taffc.2024.3423671
IF: 13.99
2024-01-01
IEEE Transactions on Affective Computing
Abstract:Recently, multimodal fusion efforts have achieved remarkable success in Multimodal Sentiment Analysis (MSA). However,most of the existing methods are based on model-level fusion, and the challenge of heterogeneity between modalities is not well resolved. Heterogeneity lies in the different feature distributions and distinct representation spaces among different modalities. To mitigate this problem, we propose that fusion is a progressive process, and we introduce a novel multi-level contrastive learning and multi-layer convolution fusion (MCL-MCF) method for MSA. Due to the relationships among multimodal data, the fusion process that involves single-modal to single-modal, single-modal to bimodal or trimodal, and higher-level fused modality semantic consistency is divided into three levels. The first-level contrast learning alleviates heterogeneity between unimodal modalities at the early level ofmultimodal feature fusion. The second-level contrast learning mitigates heterogeneity between unimodal and fused modalities. At the third level, we introduce a tensor convolution fusion (TCF) module that extracts high-level semantic features from the fused modalities and mitigates heterogeneity at the higher feature level through contrastive learning. To simulate fusion as a progressive process, MCF is proposed to fuse shallow and deep features to model complex relationships among modalities. Experiments on three public datasetsshow our approach's state-of-the-art performance.
What problem does this paper attempt to address?