Abstract:Internet memes have gained significant influence in communicating political, psychological, and sociocultural ideas. While memes are often humorous, there has been a rise in the use of memes for trolling and cyberbullying. Although a wide variety of effective deep learning-based models have been developed for detecting offensive multimodal memes, only a few works have been done on explainability aspect. Recent laws like "right to explanations" of General Data Protection Regulation, have spurred research in developing interpretable models rather than only focusing on performance. Motivated by this, we introduce {\em MultiBully-Ex}, the first benchmark dataset for multimodal explanation from code-mixed cyberbullying memes. Here, both visual and textual modalities are highlighted to explain why a given meme is cyberbullying. A Contrastive Language-Image Pretraining (CLIP) projection-based multimodal shared-private multitask approach has been proposed for visual and textual explanation of a meme. Experimental results demonstrate that training with multimodal explanations improves performance in generating textual justifications and more accurately identifying the visual evidence supporting a decision with reliable performance improvements.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the issue of multimodal explanation of cyberbullying in memes. Specifically, the authors focus on how to identify and understand cyberbullying in memes through multimodal explanations, including textual and visual explanations. ### Background and Motivation 1. **Popularity of Memes**: With the proliferation of social media platforms, memes have become an important form of multimodal content used to convey political, psychological, and socio-cultural ideas. While many memes are humorous, they are increasingly being used for cyberbullying and harassment. 2. **Harm of Cyberbullying**: Cyberbullying not only causes psychological harm to victims but can also lead to despair, anxiety, decreased self-esteem, and even suicidal thoughts. Therefore, technologies for automatic detection of cyberbullying have become particularly important. 3. **Limitations of Existing Research**: Although some studies have focused on using deep learning models to detect cyberbullying in memes, most of these studies primarily focus on classification tasks, with relatively few focusing on interpretability. Research on code-mixed language memes, in particular, is even more limited. ### Research Objectives 1. **Propose a New Task**: The authors propose a new task called "Multimodal Explanation of Code-Mixed Cyberbullying Memes" (MExCCM), which aims to generate multimodal explanations of memes, including textual and visual explanations. 2. **Construct a Dataset**: To support this task, the authors constructed the first multimodal explainable code-mixed cyberbullying dataset (MultiBully-Ex), which includes manually annotated textual and image parts to explain why a meme is considered cyberbullying. 3. **Develop a New Model**: The authors propose a multimodal shared-private multitask architecture based on Contrastive Language-Image Pretraining (CLIP) to generate textual and visual explanations of memes. ### Main Contributions 1. **Propose a New Task**: For the first time, the MExCCM task is proposed, filling a gap in the field of multimodal explanations. 2. **Construct a Dataset**: The creation of the MultiBully-Ex dataset provides valuable resources for research. 3. **Develop a New Model**: The proposed CLIP-based multimodal shared-private multitask architecture improves the accuracy of generating textual explanations and identifying visual evidence. ### Experimental Results Experimental results show that training with multimodal explanations can significantly improve the performance of generating textual explanations and more accurately identify visual evidence supporting the decision. Particularly in a multitask setting, the CLIP projection-based multimodal shared-private multitask method significantly outperforms all single-task baseline models.

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

Explain Thyself Bully: Sentiment Aided Cyberbullying Detection with Explanation

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization

Multimodal Cyberbullying Meme Detection From Social Media Using Deep Learning Approach

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

Explainable Multimodal Sentiment Analysis on Bengali Memes

MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models

MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification

Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training

A Multimodal Framework for the Detection of Hateful Memes

Comprehending the Gossips: Meme Explanation in Time-Sync Video Comment via Multimodal Cues

Do Images really do the Talking? Analysing the significance of Images in Tamil Troll meme classification

Feels Bad Man: Dissecting Automated Hateful Meme Detection Through the Lens of Facebook's Challenge

Decoding the Underlying Meaning of Multimodal Hateful Memes

Multimodal Deep Learning with Discriminant Descriptors for Offensive Memes Detection

Detecting multimodal cyber-bullying behaviour in social-media using deep learning techniques

M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge