MiRAGeNews: Multimodal Realistic AI-Generated News Detection

Runsheng Huang,Liam Dugan,Yue Yang,Chris Callison-Burch
2024-10-12
Abstract:The proliferation of inflammatory or misleading "fake" news content has become increasingly common in recent years. Simultaneously, it has become easier than ever to use AI tools to generate photorealistic images depicting any scene imaginable. Combining these two -- AI-generated fake news content -- is particularly potent and dangerous. To combat the spread of AI-generated fake news, we propose the MiRAGeNews Dataset, a dataset of 12,500 high-quality real and AI-generated image-caption pairs from state-of-the-art generators. We find that our dataset poses a significant challenge to humans (60% F-1) and state-of-the-art multi-modal LLMs (< 24% F-1). Using our dataset we train a multi-modal detector (MiRAGe) that improves by +5.1% F-1 over state-of-the-art baselines on image-caption pairs from out-of-domain image generators and news publishers. We release our code and data to aid future work on detecting AI-generated content.
Computer Vision and Pattern Recognition,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the issue of the spread of current deepfake news (especially news that combines highly realistic images with misleading or harmful text) on social media. Specifically, the paper points out: 1. **Generation and Spread of Fake News**: In recent years, the generation and spread of fake news have become increasingly common. Especially with the development of AI technology, the ability to generate highly realistic images has greatly enhanced, making these fake news more difficult to distinguish. 2. **Insufficiency of Existing Detection Methods**: Existing detection methods perform poorly in identifying fake news generated by state-of-the-art generative models (such as diffusion models). These methods usually can only detect some obvious anomalies but are inadequate for highly realistic images and texts. 3. **Lack of Datasets**: There is currently a lack of datasets containing high-quality, highly realistic AI-generated images and texts, which limits researchers' ability to develop and test new detection methods. To address these issues, the paper proposes the following solutions: - **MiRAGeNews Dataset**: A dataset containing 12,500 real and AI-generated image-text pairs was constructed, sourced from state-of-the-art generative models. This dataset aims to simulate the characteristics of fake news in the real world, providing a benchmark to evaluate and improve detection methods. - **MiRAGe Detector**: Based on the above dataset, a multimodal detector (MiRAGe) was trained, which can integrate image and text information to improve the detection capability of fake news, especially in terms of generalization performance when facing unseen generative models and news sources. Through these methods, the paper hopes to provide the research community with a powerful tool to tackle the increasingly serious issue of fake news.