WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection

Yan Hong,Jianfu Zhang
2024-02-19
Abstract:The extraordinary ability of generative models enabled the generation of images with such high quality that human beings cannot distinguish Artificial Intelligence (AI) generated images from real-life photographs. The development of generation techniques opened up new opportunities but concurrently introduced potential risks to privacy, authenticity, and security. Therefore, the task of detecting AI-generated imagery is of paramount importance to prevent illegal activities. To assess the generalizability and robustness of AI-generated image detection, we present a large-scale dataset, referred to as WildFake, comprising state-of-the-art generators, diverse object categories, and real-world applications. WildFake dataset has the following advantages: 1) Rich Content with Wild collection: WildFake collects fake images from the open-source community, enriching its diversity with a broad range of image classes and image styles. 2) Hierarchical structure: WildFake contains fake images synthesized by different types of generators from GANs, diffusion models, to other generative models. These key strengths enhance the generalization and robustness of detectors trained on WildFake, thereby demonstrating WildFake's considerable relevance and effectiveness for AI-generated detectors in real-world scenarios. Moreover, our extensive evaluation experiments are tailored to yield profound insights into the capabilities of different levels of generative models, a distinctive advantage afforded by WildFake's unique hierarchical structure.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper focuses on how to detect images generated by artificial intelligence, especially those high-quality images produced by various advanced generative models such as Generative Adversarial Networks (GANs) and Diffusion Models (DMs). As these technologies continue to advance, the recognition of forged images becomes increasingly important because they can be used to spread false information and influence public opinion. The current detection methods have limited effectiveness in dealing with unseen generative models. To overcome this challenge, the paper proposes a large-scale dataset called WildFake, which contains diverse and high-quality forged images from the open-source community, covering various image categories and styles. The dataset is characterized by a rich hierarchy, including different types of generators, different architectures, personalized weights, and different versions of the same model series. This design allows the detector to have better generalization ability and robustness after training, to adapt to the complex and diverse situations in the real world. Compared to existing datasets, the WildFake dataset has significant advantages because it is not limited to one or two generators and includes a wider range of categories and high-quality user-generated images. The paper evaluates the performance of detectors trained on WildFake through a series of experiments and tests their robustness under degradation conditions. In addition, the unique hierarchical structure of WildFake enables in-depth analysis of the capabilities of different levels of generators. In summary, the goal of this paper is to promote the development of more effective techniques for detecting artificially generated images by creating the WildFake dataset, in order to address the challenges brought by constantly evolving forgery image generation technologies.