Towards Multimodal Sarcasm Detection Via Label-Aware Graph Contrastive Learning with Back-Translation Augmentation

Yiwei Wei,Maomao Duan,Hengyang Zhou,Zhiyang Jia,Zengwei Gao,Longbiao Wang
DOI: https://doi.org/10.1016/j.knosys.2024.112109
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Multimodal sarcasm detection, as a sentiment analysis task, has witnessed great strides owing to the rapid development of multimodal machine learning. However, existing graph-based studies mainly focus on capturing the atomic-aware relations between textual and visual graphs within individual instances, neglecting label-aware connections between different instances. To address this limitation, we propose a novel Label-aware Graph Contrastive Learning (LGCL) method that detects ironic cues from a label-aware perspective of multimodal data. We first construct unimodal graphs for each instance and fuse them into graph semantic space, to obtain the multimodal graphs. Then, we introduce two label-aware graph contrastive losses: Label-aware Unimodal Contrastive Loss (LUCL) and Label-aware Multimodal Contrastive Loss (LMCL), to make the model aware of the shared ironic cues related to sentiment labels within multimodal graph representations. Additionally, we propose Back-translation Data Augmentation (BTrA) for both textual and visual data to enhance contrastive learning, where different back-translation schemes are designed to generate a larger number of positive and negative samples. Experimental results on two public datasets demonstrate our method achieves state-of-the-art (SOTA) compared to previous methods.
What problem does this paper attempt to address?