Abstract:Multimodal Aspect-Based Sentiment Analysis (MABSA) combines text and images to perform sentiment analysis but often struggles with irrelevant or misleading visual information. Existing methodologies typically address either sentence-image denoising or aspect-image denoising but fail to comprehensively tackle both types of noise. To address these limitations, we propose DualDe, a novel approach comprising two distinct components: the Hybrid Curriculum Denoising Module (HCD) and the Aspect-Enhance Denoising Module (AED). The HCD module enhances sentence-image denoising by incorporating a flexible curriculum learning strategy that prioritizes training on clean data. Concurrently, the AED module mitigates aspect-image noise through an aspect-guided attention mechanism that filters out noisy visual regions which unrelated to the specific aspects of interest. Our approach demonstrates effectiveness in addressing both sentence-image and aspect-image noise, as evidenced by experimental evaluations on benchmark datasets.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two main problems in Multimodal Aspect - Based Sentiment Analysis (MABSA): **sentence - image noise** and **aspect - image noise**. Specifically: 1. **Sentence - image noise**: - In real - world scenarios, many images are not relevant to the accompanying text, and may even mislead the understanding of the text context and sentiment. Such irrelevant or misleading visual information has a negative impact on the accuracy of sentiment analysis. 2. **Aspect - image noise**: - Even if the image is relevant to the text, certain visual areas in the image may be irrelevant to a specific aspect, thus introducing noise. For example, blurry areas or other irrelevant parts in the image may affect the understanding of a specific aspect. Existing methods usually only solve one of the noise problems and fail to comprehensively deal with both types of noise. To overcome these limitations, this paper proposes a new method - **DualDe**, which consists of two modules: - **Hybrid Curriculum Denoising Module (HCD)**: - Through a flexible curriculum learning strategy, clean data is preferentially trained to enhance the sentence - image denoising ability. - **Aspect - Enhance Denoising Module (AED)**: - By using an aspect - guided attention mechanism, noisy visual areas irrelevant to a specific aspect are filtered out, thereby improving image - text alignment. Through the synergy of these two modules, DualDe can more effectively deal with sentence - image and aspect - image noise, thereby improving the accuracy of multimodal aspect - based sentiment analysis. ### Formula summary The formulas involved in the paper are as follows: 1. **Similarity score calculation**: \[ S(X_T^i, Y_I^i)=\cos(X_T^i, Y_I^i) \] where \(S\) is the similarity score, \(\cos(·)\) is the cosine function, and \(X_T^i\) and \(Y_I^i\) represent the text and visual features obtained through the pre - trained CLIP model respectively. 2. **Sentence - level difficulty normalization**: \[ d_s^i = 1.0-\frac{S(X_T^i, Y_I^i)}{\max_{1\leq k\leq N}S(X_T^k, Y_I^k)} \] where \(N\) is the length of the training data set, and \(d_s^i\) is normalized within the range of [0.0, 1.0]. 3. **Individual loss function**: \[ L_i=-\sum_{t = 1}^{O}\log P(y_t|Y_{<t}, X_i) \] where \(L_i\) represents the loss of the \(i\) - th data sample, \(X_i\) is the input of this sample, and \(O\) is the sequence length. 4. **Comprehensive difficulty indicator**: \[ d_c^i=\alpha\cdot d_l^i+(1 - \alpha)\cdot d_s^i \] where \(\alpha\) is a weighting factor used to balance the contributions of \(d_l^i\) and \(d_s^i\). 5. **Model learning ability function**: \[ p(t)=\begin{cases} \left(\frac{q}{T}\right)^{1-\lambda_2^{\text{init}}}+\lambda_2^{\text{init}}&\text{if }t\leq T\\ 1.0&\text{otherwise} \end{cases} \]

A Dual-Module Denoising Approach with Curriculum Learning for Enhancing Multimodal Aspect-Based Sentiment Analysis

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis

Dual Edge-embedding Graph Convolutional Network for Unified Aspect-based Sentiment Analysis

Prior-Bert and Multi-Task Learning for Target-Aspect-Sentiment Joint Detection

Data Uncertainty-Aware Learning for Multimodal Aspect-based Sentiment Analysis

Aspects Are Anchors: Towards Multimodal Aspect-based Sentiment Analysis Via Aspect-driven Alignment and Refinement

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media

Dual Causes Generation Assisted Model for Multimodal Aspect-Based Sentiment Classification

KDMCSE: Knowledge Distillation Multimodal Sentence Embeddings with Adaptive Angular margin Contrastive Learning

Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis

AoM: Detecting Aspect-oriented Information for Multimodal Aspect-Based Sentiment Analysis

BCD-MM: Multimodal Sentiment Analysis Model With Dual-Bias-Aware Feature Learning and Attention Mechanisms

Target-oriented Sentiment Classification with Sequential Cross-modal Semantic Graph

Multi-Grained Fusion Network with Self-Distillation for Aspect-Based Multimodal Sentiment Analysis

DualKanbaFormer: Kolmogorov-Arnold Networks and State Space Model Transformer for Multimodal Aspect-based Sentiment Analysis

A Contrastive Cross-Channel Data Augmentation Framework for Aspect-based Sentiment Analysis

Multimodal Image Denoising based on Coupled Dictionary Learning

Aspect-Pair Supervised Contrastive Learning for aspect-based sentiment analysis

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

A Joint Training Dual-MRC Framework for Aspect Based Sentiment Analysis

New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis