Towards Robust Multimodal Sentiment Analysis with Incomplete Data

Haoyu Zhang,Wenbin Wang,Tianshu Yu
2024-11-01
Abstract:The field of Multimodal Sentiment Analysis (MSA) has recently witnessed an emerging direction seeking to tackle the issue of data incompleteness. Recognizing that the language modality typically contains dense sentiment information, we consider it as the dominant modality and present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust MSA. The proposed LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios by ensuring the quality of dominant modality representations. Aside from the methodical design, we perform comprehensive experiments under random data missing scenarios, utilizing diverse and meaningful settings on several popular datasets (\textit{e.g.,} MOSI, MOSEI, and SIMS), providing additional uniformity, transparency, and fairness compared to existing evaluations in the literature. Empirically, LNLN consistently outperforms existing baselines, demonstrating superior performance across these challenging and extensive evaluation metrics.
Computation and Language,Artificial Intelligence,Multimedia
What problem does this paper attempt to address?
The problem that this paper attempts to solve is dealing with incomplete data in multimodal sentiment analysis (MSA). Specifically, the paper focuses on how to improve the robustness and accuracy of the model when facing such incomplete data in practical applications due to problems such as sensor failures or automatic speech recognition (ASR) - caused data missing. To meet this challenge, the author proposes a new method - the Language - dominated Noise - resistant Learning Network (LNLN). LNLN enhances the robustness of the model through the following mechanisms: 1. **Dominant Modality Correction (DMC)**: - It uses adversarial learning and dynamic weighted enhancement strategies to reduce the impact of noise on the dominant modality (i.e., the language modality). - Specific steps include Completeness Check and Proxy Dominant Feature Generation. 2. **Dominant Modality based Multimodal Learning (DMML)**: - It fuses the corrected dominant modality features with the auxiliary modality (visual and audio) features to achieve effective multimodal classification. 3. **Reconstructor**: - It is used to reconstruct missing information and further improve the robustness of the system. The paper demonstrates the superior performance of LNLN under different noise levels through experiments on multiple popular datasets (such as MOSI, MOSEI and SIMS). The experimental results show that LNLN performs excellently in dealing with data - missing problems and can effectively improve the accuracy and robustness of multimodal sentiment analysis.