FDR-MSA: Enhancing multimodal sentiment analysis through feature disentanglement and reconstruction

Yao Fu,Biao Huang,Yujun Wen,Pengzhou Zhang
DOI: https://doi.org/10.1016/j.knosys.2024.111965
IF: 8.139
2024-05-23
Knowledge-Based Systems
Abstract:Multimodal sentiment analysis (MSA) is crucial as it integrates textual, visual, and audio information from videos to accurately identify human emotional states. This study proposes an innovative multimodal feature decoupling strategy that categorizes sentiment features into common and private features. The private features aim to accurately capture the uniqueness of each modality, thereby increasing feature diversity. In contrast, the common features seek to identify and capture commonalities among different modalities, thus reducing potential information loss during decoupling. To achieve this, we designed dedicated encoders and loss function constraints for both types of features. Additionally, to mitigate information redundancy and prevent key information loss during decoupled representation learning, we introduce a dual feature reconstruction mechanism comprising unimodal feature reconstruction (UFR) and multimodal feature reconstruction (MFR). These mechanisms preserve vital information from the decoupling process and mitigate the impact of redundant data. Our extensive experiments on three datasets demonstrate that our method achieves a significant margin of approximately 1%–3% in accuracy, illustrating that our approach outperforms existing advanced techniques significantly, resulting in noteworthy performance enhancements.
computer science, artificial intelligence
What problem does this paper attempt to address?