Modeling Incongruity Between Modalities for Multimodal Sarcasm Detection

Yang Wu,Yanyan Zhao,Xin Lu,Bing Qin,Yin Wu,Jian Sheng,Jinlong Li
DOI: https://doi.org/10.1109/mmul.2021.3069097
IF: 3.4911
2021-01-01
IEEE Multimedia
Abstract:Sarcasm is a sophisticated linguistic phenomenon and commonly manifests on social media platforms, which poses a great challenge for opinion mining systems. Therefore, multimodal sarcasm detection, which aims to understand the implied sentiment in the video, has gained more and more attention. However, previous works mostly focus on multimodal feature fusion without explicitly modeling the incongruity between modalities, such as expressing verbal compliments while rolling eyes, which is an obvious cue for detecting sarcasm. In this article, we propose the incongruity-aware attention network (IWAN), which detects sarcasm by focusing on the word-level incongruity between modalities via a scoring mechanism. This scoring mechanism could assign larger weights to words with incongruent modalities. Experimental results demonstrate the effectiveness of our proposed IWAN model, which not only achieves the state-of-the-art performance on the MUStARD dataset but also offers the advantages of interpretability.
What problem does this paper attempt to address?