Aspects Are Anchors: Towards Multimodal Aspect-based Sentiment Analysis Via Aspect-driven Alignment and Refinement

Zhanpeng Chen,Zhihong Zhu,Wanshi Xu,Yunyan Zhang,Xian Wu,Yefeng Zheng
DOI: https://doi.org/10.1145/3664647.3681189
2024-01-01
Abstract:Given coupled sentence image pairs, Multimodal Aspect-based Sentiment Analysis (MABSA) aims to detect aspect terms and predict their sentiment polarity. While existing methods have made great efforts in aligning images and text for improved MABSA performance, they still struggle to effectively mitigate the challenge of the noisy correspondence problem (NCP): the text description is often not well-aligned with the visual content. To alleviate NCP, in this paper, we introduce Aspect-driven Alignment and Refinement (ADAR), which is a two-stage coarse-to-fine alignment framework. In the first stage, ADAR devises a novel Coarse-to-fine Aspect-driven Alignment Module, which introduces Optimal Transport (OT) to learn the coarse-grained alignment between visual and textual features. Then the adaptive filter bin is applied to remove the irrelevant image regions at a fine-grained level; In the second stage, ADAR introduces an Aspect-driven Refinement Module to further refine the cross-modality feature representation. Extensive experiments on two benchmark datasets demonstrate the superiority of our model over state-of-the-art performance in the MABSA task.
What problem does this paper attempt to address?