What Do They “meme”? A Metaphor-Aware Multi-Modal Multi-Task Framework for Fine-Grained Meme Understanding

Bingbing Wang,Shijue Huang,Bin Liang,Geng Tu,Min Yang,Ruifeng Xu
DOI: https://doi.org/10.1016/j.knosys.2024.111778
IF: 8.139
2024-01-01
Knowledge-Based Systems
Abstract:Fine-grained meme understanding aims to explore and comprehend the meanings of memes from multiple perspectives by performing various tasks, such as sentiment analysis, intention detection, and offensiveness detection. Existing approaches primarily focus on simple multi-modality fusion and individual task analysis. However, there remain several limitations that need to be addressed: (1) the neglect of incongruous features within and across modalities, and (2) the lack of consideration for correlations among different tasks. To this end, we leverage metaphorical information as text modality and propose a Metaphor-aware Multi-modal Multi-task Framework (M3F) for fine-grained meme understanding. Specifically, we create inter-modality attention enlightened by the Transformer to capture inter-modality interaction between text and image. Moreover, intra-modality attention is applied to model the contradiction between the text and metaphorical information. To learn the implicit interaction among different tasks, we introduce a multi-interactive decoder that exploits gating networks to establish the relationship between various subtasks. Experimental results on the MET-Meme dataset show that the proposed framework outperforms the state-of-the-art baselines in fine-grained meme understanding.
What problem does this paper attempt to address?