Abstract:Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD. MMFakeBench includes 3 critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of misinformation forgery types. We further conduct an extensive evaluation of 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MMFakeBench under a zero-shot setting. The results indicate that current methods struggle under this challenging and realistic mixed-source MMD setting. Additionally, we propose an innovative unified framework, which integrates rationales, actions, and tool-use capabilities of LVLM agents, significantly enhancing accuracy and generalization. We believe this study will catalyze future research into more realistic mixed-source multimodal misinformation and provide a fair evaluation of misinformation detection methods.

What problem does this paper attempt to address?

The problem this paper attempts to address is that current Multimodal Misinformation Detection (MMD) methods typically assume that each sample has only one fabricated source and type, which is insufficient in the real world where multiple fabricated sources often coexist. This single-source assumption limits the effectiveness of existing methods in complex, real-world scenarios. Additionally, the lack of a benchmark dataset for mixed-source misinformation also hinders progress in this field. To tackle these challenges, the authors propose **MMFakeBench**, the first comprehensive mixed-source multimodal misinformation detection benchmark dataset. MMFakeBench includes three key sources of misinformation: textual authenticity distortion, visual authenticity distortion, and cross-modal consistency distortion, as well as 12 subcategories of misinformation fabrication types. Using this dataset, the authors conducted extensive zero-shot evaluations of 6 existing detection methods and 15 large vision-language models (LVLMs), revealing that current methods perform poorly on this challenging mixed-source multimodal misinformation detection task. Furthermore, the authors propose an innovative unified framework—**MMD-Agent**. This framework integrates the reasoning, action, and tool-using capabilities of LVLM agents, significantly improving detection performance and generalization ability. MMD-Agent decomposes the mixed-source detection task into three stages: textual authenticity check, visual authenticity check, and cross-modal consistency reasoning, ensuring systematic and thorough methodology. In summary, the main contributions of this paper include: 1. Introducing the concept of mixed-source multimodal misinformation detection (MMD), breaking the single-source limitation and advancing practical misinformation detection tasks. 2. Developing MMFakeBench, the first benchmark dataset for evaluating mixed-source MMD. 3. Benchmarking 6 popular detection methods and 15 LVLMs using the newly collected dataset. 4. Proposing the MMD-Agent framework, which significantly outperforms existing methods and LVLMs, providing new baselines for future research.

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

FakeBench: Uncover the Achilles' Heels of Fake Images with Large Multimodal Models

VMID: A Multimodal Fusion LLM Framework for Detecting and Identifying Misinformation of Short Videos

VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias

FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal LLMs

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

Each Fake News is Fake in its Own Way: An Attribution Multi-Granularity Benchmark for Multimodal Fake News Detection

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms

LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

LLM-Enhanced multimodal detection of fake news

MIBench: Evaluating Multimodal Large Language Models over Multiple Images

Multi-modal Chinese Fake News Detection

MMIDR: Teaching Large Language Model to Interpret Multimodal Misinformation via Knowledge Distillation

CARE-MI: Chinese Benchmark for Misinformation Evaluation in Maternity and Infant Care

MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection

A Self-Learning Multimodal Approach for Fake News Detection

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models