RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis

Yaxin Liu,Yan Zhou,Ziming Li,Jinchuan Zhang,Yu Shang,Chenyang Zhang,Songlin Hu
2024-05-20
Abstract:As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap, i.e., coarse- and fine-grained gap. Both issues may interfere with accurate identification of aspect-sentiment pairs. To address these limitations, we propose a novel framework named RNG for JMASA. Specifically, to simultaneously reduce multi-level modality noise and multi-grained semantic gap, we design three constraints: (1) Global Relevance Constraint (GR-Con) based on text-image similarity for instance-level noise reduction, (2) Information Bottleneck Constraint (IB-Con) based on the Information Bottleneck (IB) principle for feature-level noise reduction, and (3) Semantic Consistency Constraint (SC-Con) based on mutual information maximization in a contrastive learning way for multi-grained semantic gap reduction. Extensive experiments on two datasets validate our new state-of-the-art performance.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address two main challenges in the Joint Multimodal Aspect-Sentiment Analysis (JMASA) task: 1. **Multilevel Modal Noise**: This includes instance-level noise and feature-level noise. Instance-level noise refers to the irrelevance between text and images, while feature-level noise refers to the noisy features within each modality. 2. **Multigranularity Semantic Gap**: This includes coarse-grained and fine-grained semantic gaps. These gaps make it difficult to accurately align aspects and their sentiment polarities extracted from text and images. To tackle these issues, the paper proposes a new framework named RNG, which includes three constraints: - Global Relevance Constraint (GR-Con): Reduces instance-level noise based on the similarity between text and images. - Information Bottleneck Constraint (IB-Con): Reduces feature-level noise based on the information bottleneck principle. - Semantic Consistency Constraint (SC-Con): Reduces multigranularity semantic gaps by maximizing mutual information. Experimental results show that RNG achieves state-of-the-art performance on two benchmark datasets.