Jiayang Liu,Siyu Zhu,Siyuan Liang,Jie Zhang,Han Fang,Weiming Zhang,Ee-Chien Chang
Abstract:Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving DNN predictions. While some attack methods excel in the white-box setting, they often struggle in the black-box scenario, particularly against models fortified with defense mechanisms. Various techniques have emerged to enhance the transferability of adversarial attacks for the black-box scenario. Among these, input transformation-based attacks have demonstrated their effectiveness. In this paper, we explore the potential of leveraging data generated by Stable Diffusion to boost adversarial transferability. This approach draws inspiration from recent research that harnessed synthetic data generated by Stable Diffusion to enhance model generalization. In particular, previous work has highlighted the correlation between the presence of both real and synthetic data and improved model generalization. Building upon this insight, we introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images. Furthermore, we propose a fast variant of SDAM to reduce computational overhead while preserving high adversarial transferability. Our extensive experimental results demonstrate that our method outperforms state-of-the-art baselines by a substantial margin. Moreover, our approach is compatible with existing transfer-based attacks to further enhance adversarial transferability.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the transferability of adversarial examples in black - box scenarios. Specifically, existing adversarial attack methods perform well in white - box scenarios, but in black - box scenarios, especially when facing models strengthened by defense, their effectiveness is often not good. The paper proposes a new attack method - the Stable Diffusion Attack Method (SDAM), which enhances the input image by using data generated by the stable diffusion model, thereby improving the transferability of adversarial examples.
### Background and Motivation of the Paper
- **Background**: Deep neural networks (DNNs) are vulnerable to adversarial examples, which deceive model predictions by adding small perturbations to normal samples. An important property of adversarial examples is transferability, that is, adversarial examples generated on one model can effectively deceive other models. Therefore, researching how to generate adversarial examples with high transferability is of great significance for identifying and improving the robustness of neural networks.
- **Motivation**: Existing input - transformation attack methods mainly use real data for enhancement, which may limit the transferability of the attack. The authors of the paper noticed that synthetic data generated by the stable diffusion model can improve the generalization ability of the model, so they proposed using this synthetic data to enhance the transferability of adversarial examples.
### Overview of the Method
- **SDAM Method**: SDAM enhances the input image by mixing the target image with multiple samples generated by the stable diffusion model. Specifically, the mixed image is represented as:
\[
\bar{x} = \eta \cdot x_{\text{adv}}^t+(1 - \eta) \cdot x_j^t
\]
where \(x_{\text{adv}}^t\) is the adversarial example of the current iteration, \(x_j^t\) is the sample generated by the pre - trained stable diffusion model, and \(\eta\) is the mixing ratio.
- **Fast - version SDAM**: To reduce computational overhead, the paper also proposes a fast - version of SDAM (SDAM - Fast). In the fast version, \(x_j^0\) is generated only once in the initial iteration, and \(x_j^0\) is reused for mixing in subsequent iterations.
### Experimental Results
- **Single - model Attack**: The experimental results show that the attack success rate of SDAM on black - box models is significantly higher than that of the baseline methods. For example, in the attack against the Res - 101 model, the attack success rate of SDAM is 90.9%, which is 17.5% higher than that of PAM.
- **Multi - model Attack**: When attacking a combination of multiple models, SDAM still performs well. For example, the attack success rate of SDAM on Inc - v3 ens3, Inc - v3 ens4 and IncRes - v2 ens are 95.2%, 94.4% and 89.0% respectively, which are more than 6% higher than that of PAM.
- **Combined with Input - transformation Methods**: SDAM can be combined with other input - transformation methods (such as DIM, TIM and DI - TIM) to further improve the transferability of adversarial examples. The experimental results show that the combined SDAM performs particularly well on adversarially - trained models.
- **Attacking Defense Models**: SDAM also performs well when attacking multiple defense methods. For example, the average attack success rate of SDAM on seven defense models is 47.9%, which is significantly better than the baseline methods.
### Main Contributions
- Proposed a new attack method SDAM, which enhances the transferability of adversarial examples by using data generated by the stable diffusion model.
- Designed a fast - version of SDAM to reduce computational overhead.
- Verified the effectiveness of SDAM through extensive experiments and demonstrated its superior performance in multiple scenarios.
In conclusion, this paper effectively improves the transferability of adversarial examples by introducing synthetic data generated by the stable diffusion model, providing new ideas and methods for the research of adversarial attacks.