An attention-based, context-aware multimodal fusion method for sarcasm detection using inter-modality inconsistency

Yangyang Li,Yuelin Li,Shihuai Zhang,Guangyuan Liu,Yanqiao Chen,Ronghua Shang,Licheng Jiao
DOI: https://doi.org/10.1016/j.knosys.2024.111457
IF: 8.139
2024-02-01
Knowledge-Based Systems
Abstract:Sarcasm, a subtle and complex form of expression, presents significant challenges in detection, especially in the context of social media and meta universe applications where communication extends beyond text to include videos, images, and audio. Traditional sarcasm detection methods relying solely on text data often fail to capture the emotional incongruities and subtleties inherent in sarcasm. To address these challenges, this paper introduces a novel multimodal sarcasm detection method that not only processes multimodal data but also focuses on modeling the emotional mismatch between different modalities, a crucial aspect often overlooked by conventional approaches. Our method employs an intermodal emotional inconsistency detection mechanism, a contextual scenario inconsistency detection mechanism, and a cross-modal and segmented attention mechanism. These innovations enable a richer and more nuanced feature representation, capturing the essence of sarcasm more effectively. Experimental results on the dataset MUStARD Extended confirm the superiority of our approach, establishing it as the new state-of-the-art in sarcasm detection compared to existing models.
computer science, artificial intelligence
What problem does this paper attempt to address?