An Autoencoder-based Self-Supervised Learning for Multimodal Sentiment Analysis

Wenjun Feng,Xin Wang,Donglin Cao,Dazhen Lin
DOI: https://doi.org/10.1016/j.ins.2024.120682
IF: 8.1
2024-05-24
Information Sciences
Abstract:Representation learning is a crucial and challenging task within multimodal sentiment analysis. Effective multimodal sentiment representations contain two key aspects: consistency and difference. However, the state-of-the-art multimodal sentiment analysis approaches failed to capture the difference and consistency of sentiment information across diverse modalities. To address the multimodal sentiment representation problem, we propose an autoencoder-based self-supervised learning framework. In the pre-training stage, an autoencoder is designed for each modality, leveraging unlabeled data to learn richer sentiment representations for each modality through sample reconstruction and modality consistency detection tasks. In the fine-tuning stage, the pre-trained autoencoder is injected into MulT (AE-MT) and enhance the model's ability to extract deep sentiment information by incorporating a contrastive learning auxiliary task. Our experiments on the popular Chinese sentiment analysis benchmark (CH-SIMS v2.0) and English sentiment analysis benchmark (MOSEI) demonstrate significant gains over baseline models.
computer science, information systems
What problem does this paper attempt to address?