Abstract:In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit{MemeGuard}, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. \textit{MemeGuard} harnesses a specially fine-tuned VLM, \textit{VLMeme}, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (\textit{MKS}) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the \textit{\textbf{I}ntervening} \textit{\textbf{C}yberbullying in \textbf{M}ultimodal \textbf{M}emes (ICMM)} dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage \textit{ICMM} to test \textit{MemeGuard}, demonstrating its proficiency in generating relevant and effective responses to toxic memes.

What problem does this paper attempt to address?

### The Problem the Paper Attempts to Solve The paper attempts to address the challenge of content moderation in the digital world due to the widespread dissemination of memes. Specifically, although existing detection methods have improved, proactive interventions for multimodal content like memes remain limited. Current research mainly focuses on text-based content, neglecting the extensive impact of multimodal content such as memes. **The main issues include:** 1. **Complexity and Cultural Connotations of Memes**: Memes often have a high degree of contextual dependency and cultural background, making it difficult for traditional Visual Language Models (VLMs) to accurately understand and interpret them. 2. **Limitations of Content Moderation**: Existing content moderation systems mainly rely on reactive measures, lacking effective preventive measures to proactively intervene in the spread of harmful content. 3. **Handling of Multimodal Content**: Current research primarily focuses on text content, ignoring the impact of multimodal content like memes, which exacerbates the potential risks of multimodal harmful content. ### Solution To address the above issues, the authors propose a framework named **MemeGuard**, which leverages Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. Specifically: 1. **MemeGuard Framework**: - **VLMeme**: A specially fine-tuned Visual Language Model designed to understand and interpret the complexity of memes. - **Multimodal Knowledge Selection Mechanism (MKS)**: Used to filter task-relevant knowledge, avoiding interference from irrelevant information. - **Intervention Generation Module**: Utilizes refined knowledge to generate appropriate intervention content. 2. **ICMM Dataset**: - The authors developed a high-quality annotated dataset **ICMM**, which includes toxic memes and their corresponding manually annotated intervention content, to test the effectiveness of MemeGuard. Through these methods, MemeGuard aims to generate relevant and effective responses to mitigate the negative impact of toxic memes and promote more positive and respectful online communication.

MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

A Multimodal Framework for the Detection of Hateful Memes

Meme-ingful Analysis: Enhanced Understanding of Cyberbullying in Memes Through Multimodal Explanations

Beneath the Surface: Unveiling Harmful Memes with Multimodal Reasoning Distilled from Large Language Models

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

Towards Low-Resource Harmful Meme Detection with LMM Agents

Detecting and Understanding Harmful Memes: A Survey

Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos

MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language

Multimodal Deep Learning with Discriminant Descriptors for Offensive Memes Detection

KERMIT: Knowledge-EmpoweRed model in harmful meme deTection

MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing

A Review of Vision-Language Models and their Performance on the Hateful Memes Challenge

Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge

MemeFier: Dual-stage Modality Fusion for Image Meme Classification

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation

Multimodal Hate Speech Detection in Memes Using Contrastive Language-Image Pre-Training

A Multimodal Memes Classification: A Survey and Open Research Issues