DeepfakeArt Challenge: A Benchmark Dataset for Generative AI Art Forgery and Data Poisoning Detection

Hossein Aboutalebi,Dayou Mao,Rongqi Fan,Carol Xu,Chris He,Alexander Wong
2024-05-23
Abstract:The tremendous recent advances in generative artificial intelligence techniques have led to significant successes and promise in a wide range of different applications ranging from conversational agents and textual content generation to voice and visual synthesis. Amid the rise in generative AI and its increasing widespread adoption, there has been significant growing concern over the use of generative AI for malicious purposes. In the realm of visual content synthesis using generative AI, key areas of significant concern has been image forgery (e.g., generation of images containing or derived from copyright content), and data poisoning (i.e., generation of adversarially contaminated images). Motivated to address these key concerns to encourage responsible generative AI, we introduce the DeepfakeArt Challenge, a large-scale challenge benchmark dataset designed specifically to aid in the building of machine learning algorithms for generative AI art forgery and data poisoning detection. Comprising of over 32,000 records across a variety of generative forgery and data poisoning techniques, each entry consists of a pair of images that are either forgeries / adversarially contaminated or not. Each of the generated images in the DeepfakeArt Challenge benchmark dataset \footnote{The link to the dataset: http://anon\_for\
Computer Vision and Pattern Recognition,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The paper focuses on the problems caused by the widespread application of generative artificial intelligence (AI) technology in the field of art, particularly in image forgery and data poisoning detection. With the advancement of generative AI, it has achieved remarkable achievements in text, speech, and visual content synthesis, but it also brings risks of abuse, such as image forgery (e.g., forged images of copyrighted content) and data poisoning (maliciously contaminating data to mislead AI systems). The paper presents a large-scale challenge benchmark dataset called DeepfakeArt Challenge, aiming to help develop machine learning algorithms for detecting AI art forgery and data poisoning. This dataset contains over 32,000 records, covering various forgery and data poisoning techniques, with each record composed of a pair of images that could be forged or contaminated, or possibly untainted. All generated images have undergone comprehensive quality checks. The paper discusses two core issues: art forgery (copyright infringement detection) and data poisoning. For art forgery, the paper defines the conditions for copyright infringement and emphasizes the importance of identifying potential copyright infringement in generated models. For data poisoning, the paper introduces the concept of adversarial data injection, which involves adding small noises to mislead AI decision-making systems. The experimental section showcases the performance of different models on the DeepfakeArt dataset, revealing that the current models have a relatively high false negative rate in detecting similar pairs, indicating the risk of copyright infringement going undetected. In summary, this paper aims to promote the development of more robust detection tools in the research community by creating the DeepfakeArt Challenge dataset, addressing the legal and security concerns that generative AI may bring in visual content synthesis.