Abstract:Transferability of adversarial examples is critical for black-box deep learning model attacks. While most existing studies focus on enhancing the transferability of untargeted adversarial attacks, few of them studied how to generate transferable targeted adversarial examples that can mislead models into predicting a specific class. Moreover, existing transferable targeted adversarial attacks usually fail to sufficiently characterize the target class distribution, thus suffering from limited transferability. In this paper, we propose the Transferable Targeted Adversarial Attack (TTAA), which can capture the distribution information of the target class from both label-wise and feature-wise perspectives, to generate highly transferable targeted adversarial examples. To this end, we design a generative adversarial training framework consisting of a generator to produce targeted adversarial examples, and feature-label dual discriminators to distinguish the generated adversarial examples from the target class images. Specifically, we design the label discriminator to guide the adversarial examples to learn label-related distribution information about the target class. Meanwhile, we design a feature discriminator, which extracts the feature-wise information with strong cross-model consistency, to enable the adversarial examples to learn the transferable distribution information. Furthermore, we introduce the random perturbation dropping to further enhance the transferability by augmenting the diversity of adversarial examples used in the training process. Experiments demonstrate that our method achieves excellent performance on the transferability of targeted adversarial examples. The targeted fooling rate reaches 95.13% when transferred from VGG-19 to DenseNet-121, which significantly outperforms the state-of-the-art methods.

Learning transferable targeted universal adversarial perturbations by sequential meta-learning

Training NLI Models Through Universal Adversarial Attack

Improving Transferability of Universal Adversarial Perturbation with Feature Disruption.

Learning Universal Adversarial Perturbation by Adversarial Example

On Success and Simplicity: A Second Look at Transferable Targeted Attacks

Improving Transferable Targeted Adversarial Attack via Normalized Logit Calibration and Truncated Feature Mixing

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

Understanding Model Ensemble in Transferable Adversarial Attack

Enhanced covertness class discriminative universal adversarial perturbations

Enhancing the Self-Universality for Transferable Targeted Attacks

Exploring Frequencies via Feature Mixing and Meta-Learning for Improving Adversarial Transferability

Generalizable Black-Box Adversarial Attack With Meta Learning

Delving into Transferable Adversarial Examples and Black-box Attacks

Enhancing Adversarial Attacks: The Similar Target Method

Improving Transferable Targeted Attacks with Feature Tuning Mixup

Learning to Transform Dynamically for Better Adversarial Transferability

Towards Transferable Targeted Attack.

Enhancing Transferability of Targeted Adversarial Examples: A Self-Universal Perspective

A Fully Supervised Universal Adversarial Perturbations and the Progressive Optimization

Comparative Evaluation of Recent Universal Adversarial Perturbations in Image Classification

Towards Transferable Targeted Adversarial Examples