AMMeBa: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild

Nicholas Dufour,Arkanath Pathak,Pouya Samangouei,Nikki Hariri,Shashi Deshetti,Andrew Dudfield,Christopher Guess,Pablo Hernández Escayola,Bobby Tran,Mevan Babakar,Christoph Bregler
2024-05-21
Abstract:The prevalence and harms of online misinformation is a perennial concern for internet platforms, institutions and society at large. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have amplified these concerns. Despite intense public interest and significant press coverage, quantitative information on the prevalence and modality of media-based misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online media-based misinformation, mostly focusing on images, based on claims assessed in a large sample of publicly-accessible fact checks with the ClaimReview markup. We present an image typology, designed to capture aspects of the image and manipulation relevant to the image's role in the misinformation claim. We visualize the distribution of these types over time. We show the rise of generative AI-based content in misinformation claims, and that its commonality is a relatively recent phenomenon, occurring significantly after heavy press coverage. We also show "simple" methods dominated historically, particularly context manipulations, and continued to hold a majority as of the end of data collection in November 2023. The dataset, Annotated Misinformation, Media-Based (AMMeBa), is publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.
Computers and Society
What problem does this paper attempt to address?
The paper aims to address the issue of online misinformation related to media, particularly images. Specifically, the research team created a large-scale classification system to capture various types of media misinformation that appear in real-world environments and categorized a large number of fact-checked misinformation cases. The main issues the paper attempts to address are as follows: 1. **Current Status and Trends of Media Misinformation**: By analyzing 135,838 fact-check records since 1995, the paper finds that approximately 80% involve media content. Images have always been the primary medium, but videos have become increasingly common since 2022. 2. **Impact of Generative AI Content**: Although public concern about AI-generated content has existed for a long time, such content did not significantly increase until the spring of 2023. The research shows that simple methods (such as contextual manipulation) still dominate. 3. **Types of Image Manipulation**: Most image manipulations are simple and do not require advanced technical means. The most common type is contextual manipulation, which involves using unmodified images with incorrect claims. 4. **Role of Text in Images**: Images often contain text, which usually conveys the core content of the misinformation. Through this study, the authors hope to provide a large-scale and detailed classification system for media misinformation, enabling researchers to better understand and address this issue. Additionally, the dataset AMM EBA (Annotated Misinformation, Media-Based) has been publicly released for use by researchers in related fields.