Unveiling Universal Forensics of Diffusion Models with Adversarial Perturbations

Kangyang Xie,Jiaan Liu,Muzhi Zhu,Ganggui Ding,Zide Liu,Hao Chen,Hangyue Chen
DOI: https://doi.org/10.1109/ijcnn60899.2024.10650754
2024-01-01
Abstract:With state-of-the-art performance in image synthesis especially in text-to-image generation, diffusion models (DMs) have received unprecedented attention. Despite promising application prospects, fake images generated from diffusion models are causing potential security concerns. To this end, in this work, we aim to investigate whether the generated images of DMs are different from other generated models e.g. generative adversarial networks (GANs) and whether a universal forensic classifier exists. To perform this work, we first collected a dataset consisting of 409k fake images generated from different types of DMs. Through a comprehensive analysis on this benchmark, we showcased that common forensic artifacts are shared among DMs and a forensic classifier trained for one model can generalize well for other agnostic generative models. Specifically, we first demonstrated that despite photorealism of the images generated by DMs, they still contain artifacts namely non-robust visual features which are hard for human but easy for machine to recognize. Then we studied the characteristics of the artifacts from the view of adversarial attack and unexpectedly found there exists a universal adversarial perturbation to fool the classifier. Furthermore, we devised visualization and analysis tools focusing on the spectral properties of the generated samples and adversarial features which demonstrates augmentations in the frequency domain greatly affect the performance of the detectors.
What problem does this paper attempt to address?