Unlocking The Potential of Adaptive Attacks on Diffusion-Based Purification

Andre Kassis,Urs Hengartner,Yaoliang Yu
2024-11-26
Abstract:Diffusion-based purification (DBP) is a defense against adversarial examples (AEs), amassing popularity for its ability to protect classifiers in an attack-oblivious manner and resistance to strong adversaries with access to the defense. Its robustness has been claimed to ensue from the reliance on diffusion models (DMs) that project the AEs onto the natural distribution. We revisit this claim, focusing on gradient-based strategies that back-propagate the loss gradients through the defense, commonly referred to as ``adaptive attacks". Analytically, we show that such an optimization method invalidates DBP's core foundations, effectively targeting the DM rather than the classifier and restricting the purified outputs to a distribution over malicious samples instead. Thus, we reassess the reported empirical robustness, uncovering implementation flaws in the gradient back-propagation techniques used thus far for DBP. We fix these issues, providing the first reliable gradient library for DBP and demonstrating how adaptive attacks drastically degrade its robustness. We then study a less efficient yet stricter majority-vote setting where the classifier evaluates multiple purified copies of the input to make its decision. Here, DBP's stochasticity enables it to remain partially robust against traditional norm-bounded AEs. We propose a novel adaptation of a recent optimization method against deepfake watermarking that crafts systemic malicious perturbations while ensuring imperceptibility. When integrated with the adaptive attack, it completely defeats DBP, even in the majority-vote setup. Our findings prove that DBP, in its current state, is not a viable defense against AEs.
Cryptography and Security,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper mainly explores and solves the problem of the vulnerability of Diffusion - based Purification (DBP) in adversarial attacks. Specifically, as a method to defend against Adversarial Examples (AEs), the core principle of DBP is to project adversarial examples onto the natural distribution through the diffusion model, thereby protecting the classifier from attacks. However, the author questions the validity of this assumption and attempts to answer the following key questions: 1. **Is the theoretical basis of DBP reliable?** - The paper analyzes gradient - based adaptive attacks. These attacks optimize adversarial examples by back - propagating the loss gradient. The author points out that this optimization method actually undermines the core basis of DBP because it targets the diffusion model itself rather than the classifier, resulting in the purified output being restricted to the distribution of a malicious sample. 2. **Are there flaws in previous studies on DBP robustness?** - The author finds that previous studies had technical errors and unresolved factors in implementing gradient back - propagation, which led to deviations between the actually calculated gradients and the theoretical values. For this reason, they propose a new gradient library - DiffGrad to ensure the accuracy of gradient calculation. 3. **How does DBP perform in the multi - path voting setting?** - In the more stringent multi - path voting (majority - vote) setting, DBP shows partial robustness. The author believes that this is due to the increased randomness of DBP, which weakens the impact of common adversarial perturbations on multiple paths. However, in order to completely defeat DBP, the author proposes a low - frequency (LF) optimization method, which can generate systematic, imperceptible adversarial perturbations while affecting a large number of pixels. 4. **How can adversarial attacks be improved to completely defeat DBP?** - The author proposes a new optimization method named LF, which combines the Optimizable Filters (OFs) in the recent image watermarking attack technique UnMarker. This method generates a wider range of perturbations by considering the correlations between pixels, thereby effectively bypassing the randomness and noise - handling mechanisms of DBP. ### Main contributions 1. **Theoretical analysis**: It is proved that adaptive attacks will undermine the theoretical basis of DBP. 2. **Technical improvement**: Identify and fix the errors in the existing gradient back - propagation methods and provide a reliable gradient - calculating tool, DiffGrad. 3. **Experimental verification**: Demonstrate the significant vulnerability of DBP under standard optimization methods, especially in single - path and multi - path voting settings. 4. **Innovative method**: Propose the low - frequency optimization method (LF), which can completely defeat DBP under strict conditions. Through these contributions, the paper reveals that DBP in its current form is not an effective defense method against adversarial examples and provides new directions and technical means for future research.