Improving the Transferability of Adversarial Examples with Separable Positive and Negative Disturbances

Yuanjie Yan,Yuxuan Bu,Furao Shen,Jian Zhao
DOI: https://doi.org/10.1007/s00521-023-09259-5
2024-01-01
Abstract:Adversarial examples demonstrate the vulnerability of white-box models but exhibit weak transferability to black-box models. In image processing, each adversarial example usually consists of original image and disturbance. The disturbances are essential for the adversarial examples, determining the attack success rate on black-box models. To improve the transferability, we propose a new white-box attack method called separable positive and negative disturbance (SPND). SPND optimizes the positive and negative perturbations instead of the adversarial examples. SPND also smooths the search space by replacing constrained disturbances with unconstrained variables, which improves the success rate of attacking the black-box model. Our method outperforms the other attack methods in the MNIST and CIFAR10 datasets. In the ImageNet dataset, the black-box attack success rate of SPND exceeds the optimal CW method by nearly ten percentage points under the perturbation of L_∞ = 0.3 .
What problem does this paper attempt to address?