AoM: Detecting Aspect-oriented Information for Multimodal Aspect-Based Sentiment Analysis

Ru Zhou,Wenya Guo,Xumeng Liu,Shenglong Yu,Ying Zhang,Xiaojie Yuan
DOI: https://doi.org/10.48550/arXiv.2306.01004
2023-05-31
Computation and Language
Abstract:Multimodal aspect-based sentiment analysis (MABSA) aims to extract aspects from text-image pairs and recognize their sentiments. Existing methods make great efforts to align the whole image to corresponding aspects. However, different regions of the image may relate to different aspects in the same sentence, and coarsely establishing image-aspect alignment will introduce noise to aspect-based sentiment analysis (i.e., visual noise). Besides, the sentiment of a specific aspect can also be interfered by descriptions of other aspects (i.e., textual noise). Considering the aforementioned noises, this paper proposes an Aspect-oriented Method (AoM) to detect aspect-relevant semantic and sentiment information. Specifically, an aspect-aware attention module is designed to simultaneously select textual tokens and image blocks that are semantically related to the aspects. To accurately aggregate sentiment information, we explicitly introduce sentiment embedding into AoM, and use a graph convolutional network to model the vision-text and text-text interaction. Extensive experiments demonstrate the superiority of AoM to existing methods. The source code is publicly released at https://github.com/SilyRab/AoM.
What problem does this paper attempt to address?
The paper aims to address two main issues in multimodal aspect-based sentiment analysis: 1. **Visual Noise Problem**: Existing methods typically align the entire image with the aspect in the text, which can introduce visual noise unrelated to the specific aspect into the sentiment analysis. 2. **Textual Noise Problem**: Descriptions of different aspects may interfere with each other, leading to inaccurate sentiment recognition. To solve these problems, the paper proposes a method called Aspect-oriented Method (AoM), which is implemented through the following two key modules: - **Aspect-Aware Attention Module (A3M)**: Used to select semantic information related to the specific aspect from images and text. - **Aspect-Guided Graph Convolutional Network (AG-GCN)**: Used to effectively aggregate sentiment information related to the specific aspect and introduce external sentiment embeddings to mitigate sentiment confusion between different aspects. Experimental results show that AoM outperforms existing methods on two benchmark datasets, Twitter2015 and Twitter2017, with significant improvements across multiple evaluation metrics.