Abstract:Deep neural networks (DNNs) are shown to be vulnerable to universal adversarial perturbations (UAP), a single quasi-imperceptible perturbation that deceives the DNNs on most input images. The current UAP methods can be divided into data-dependent and data-independent methods. The former exhibits weak transferability in black-box models due to overly relying on model-specific features. The latter shows inferior attack performance in white-box models as it fails to exploit the model's response information to benign images. To address the above issues, this paper proposes a novel universal adversarial attack to generate UAP with strong transferability by disrupting the model-agnostic features (e.g., edges or simple texture), which are invariant to the models. Specifically, we first devise an objective function to weaken the significant channel-wise features and strengthen the less significant channel-wise features, which are partitioned by the designed strategy. Furthermore, the proposed objective function eliminates the dependency on labeled samples, allowing us to utilize out-of-distribution (OOD) data to train UAP. To enhance the attack performance with limited training samples, we exploit the average gradient of the mini-batch input to update the UAP iteratively, which encourages the UAP to capture the local information inside the mini-batch input. In addition, we introduce the momentum term to accumulate the gradient information at each iterative step for the purpose of perceiving the global information over the training set. Finally, extensive experimental results demonstrate that the proposed methods outperform the existing UAP approaches. Additionally, we exhaustively investigate the transferability of the UAP across models, datasets, and tasks.

Generating Universal Adversarial Perturbation with ResNet

Crafting Universal Adversarial Perturbations with Output Vectors

Universal Adversarial Perturbation Generated by Attacking Layer-wise Relevance Propagation

An Universal Perturbation Generator for Black-Box Attacks Against Object Detectors.

Learning Universal Adversarial Perturbation by Adversarial Example

Adversarial Transformation Network with Adaptive Perturbations for Generating Adversarial Examples.

Generalizing universal adversarial perturbations for deep neural networks

An adversarial defense algorithm based on robust U-net

Universal Adversarial Perturbations Against Semantic Image Segmentation

Robust Universal Adversarial Perturbations

Universal Perturbation Generation for Black-box Attack Using Evolutionary Algorithms

Comparative Evaluation of Recent Universal Adversarial Perturbations in Image Classification

Crafting Targeted Universal Adversarial Perturbations: Considering Images as Noise

Adaptive Perturbation for Adversarial Attack

Multi-scale Features Destructive Universal Adversarial Perturbations

Improving Transferability of Universal Adversarial Perturbation with Feature Disruption.

TransNoise: Transferable Universal Adversarial Noise for Adversarial Attack

Fast-UAP: An Algorithm for Speeding up Universal Adversarial Perturbation Generation with Orientation of Perturbation Vectors

Universal Adversarial Perturbation Via Prior Driven Uncertainty Approximation

A Universal Targeted Attack Method against Image Classification