Abstract:The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.

What problem does this paper attempt to address?

The paper attempts to address the issue of the spread of current deepfake news (especially news that combines highly realistic images with misleading or harmful text) on social media. Specifically, the paper points out: 1. **Generation and Spread of Fake News**: In recent years, the generation and spread of fake news have become increasingly common. Especially with the development of AI technology, the ability to generate highly realistic images has greatly enhanced, making these fake news more difficult to distinguish. 2. **Insufficiency of Existing Detection Methods**: Existing detection methods perform poorly in identifying fake news generated by state-of-the-art generative models (such as diffusion models). These methods usually can only detect some obvious anomalies but are inadequate for highly realistic images and texts. 3. **Lack of Datasets**: There is currently a lack of datasets containing high-quality, highly realistic AI-generated images and texts, which limits researchers' ability to develop and test new detection methods. To address these issues, the paper proposes the following solutions: - **MiRAGeNews Dataset**: A dataset containing 12,500 real and AI-generated image-text pairs was constructed, sourced from state-of-the-art generative models. This dataset aims to simulate the characteristics of fake news in the real world, providing a benchmark to evaluate and improve detection methods. - **MiRAGe Detector**: Based on the above dataset, a multimodal detector (MiRAGe) was trained, which can integrate image and text information to improve the detection capability of fake news, especially in terms of generalization performance when facing unseen generative models and news sources. Through these methods, the paper hopes to provide the research community with a powerful tool to tackle the increasingly serious issue of fake news.

MiRAGeNews: Multimodal Realistic AI-Generated News Detection

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

MRDCA: a multimodal approach for fine-grained fake news detection through integration of RoBERTa and DenseNet based upon fusion mechanism of co-attention

The Reopening of Pandora's Box: Analyzing the Role of LLMs in the Evolving Battle Against AI-Generated Fake News

Real-time Fake News from Adversarial Feedback

A Multimodal Approach for Detecting AI Generated Content using BERT and CNN

Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection

Detecting Out-of-Context Image-Caption Pairs in News: A Counter-Intuitive Method

Style-News: Incorporating Stylized News Generation and Adversarial Verification for Neural Fake News Detection

A Sanity Check for AI-generated Image Detection

Construction of Multi-Modal Social Media Dataset for Fake News Detection

GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image

Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian

A Self-Learning Multimodal Approach for Fake News Detection

J-Guard: Journalism Guided Adversarially Robust Detection of AI-generated News

The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

Detection of AI-Generated Synthetic Images with a Lightweight CNN

Robust Detection of Fake News Using LSTM and GloVe Embeddings

Deep Learning Multimodal Methods to Detect Fake News

DETER: Detecting Edited Regions for Deterring Generative Manipulations

The Tug-of-War Between Deepfake Generation and Detection