AdvDiffuser: Natural Adversarial Example Synthesis with Diffusion Models

Xinquan Chen,Xitong Gao,Juanjuan Zhao,Kejiang Ye,Cheng-Zhong Xu
DOI: https://doi.org/10.1109/iccv51070.2023.00421
2023-01-01
Abstract:Previous work on adversarial examples typically involves a fixed norm perturbation budget, which fails to capture the way humans perceive perturbations. Recent work has shifted towards natural unrestricted adversarial examples (UAEs) that breaks ℓ p perturbation bounds but nonetheless remain semantically plausible. Current methods use GAN or VAE to generate UAEs by perturbing latent codes. However, this leads to loss of high-level information, resulting in low-quality and unnatural UAEs. In light of this, we propose AdvDiffuser, a new method for synthesizing natural UAEs using diffusion models. It can generate UAEs from scratch or conditionally based on reference images. To generate natural UAEs, we perturb predicted images to steer their latent code towards the adversarial sample space of a particular classifier. We also propose adversarial inpainting based on class activation mapping to retain the salient regions of the image while perturbing less important areas. On CIFAR-10, CelebA and ImageNet, we demonstrate that it can defeat the most robust models on the RobustBench leaderboard with near 100% success rates. Furthermore, The synthesized UAEs are not only more natural but also stronger compared to the current state-of-the-art attacks. Specifically, compared with GA-attack, the UAEs generated with AdvDiffuser exhibit 6× smaller LPIPS perturbations, 2 ~ 3× smaller FID scores and 0.28 higher in SSIM metrics, making them perceptually stealthier. Finally, adversarial training with AdvDiffuser further improves the model robustness against attacks with unseen threat models. 1
What problem does this paper attempt to address?