MMSD-CAF: MultiModal Sarcasm Detection using CoAttention and Fusion Mechanisms

H. A. P. Kashyap,Jayant Miglani,Madhav Maheshwari,Rahul Katarya
DOI: https://doi.org/10.1109/CONIT61985.2024.10626994
2024-06-21
Abstract:A lot of posts and comments across social media platforms including but not limited to Twitter, Instagram intend sarcasm. Sarcasm is a figurative language not just reliant on the text but also on various contextual cues. In our work herewith, we leverage the usage of multimodal cues in the form of textual and image data. The goal is to capture the relationship of sarcasm between the two modalities through deep learning techniques. We evolve our model over three iterations using different embedding representations for text and images while utilising different fusions of text embeddings from FLAVA text encoder and BERT model while image embeddings from FLAVA image encoder and VGG19 model. We leverage mechanisms of co-attention, multi-head attention and keyless attention to capture the contextual cues. Moreover, we utilise the publicly available MMSD2.0 dataset all across. The results are evaluated on the test data in terms of accuracy, precision, recall and F1-score. In the following paper, we show that our proposed novel model, hence named MMSD-CAF, using FLAVA text, image and multimodal learned embeddings along with co-attention and keyless attention-based fusion is able to produce better results than the established models.
Computer Science
What problem does this paper attempt to address?