Abstract:Universal adversarial patch attacks, which are readily implemented, have been validated to be able to fool real-world deep convolutional neural networks (CNNs), posing a serious threat to practical computer vision systems based on CNNs. Unfortunately, current defending approaches are severely understudied facing the following problems. Patch detection-based methods suffer from dramatic performance drops against white-box or adaptive attacks since they rely heavily on empirical clues. Methods based on adversarial training or certified defense are difficult to be scaled up to large-scale datasets or complex practical networks due to prohibitively high computational overhead or over strong assumptions on the network structure. In this article, we focus on two cases of widely adopted universal adversarial patch attacks, namely the universal targeted attack on image classifiers and the universal vanishing attack on object detectors. We find that, for popular CNNs, the attacking success of the adversarial patch relies on feature vectors centered at the patch location with large norm in classifiers and large channel-aware norm (CA-Norm) in detectors, and further present a mathematical explanation for this phenomenon. Based on this, we propose a simple but effective defending method using the feature norm suppressing (FNS) layer, which can renormalize the feature norm by nonincreasing functions. As a differentiable module, FNS can be adaptively inserted in various CNN architectures to achieve multistage suppression of the generation of large norm feature vectors. Moreover, FNS is efficient with no trainable parameters and very low computational overhead. We evaluate our proposed defending method across multiple CNN architectures and datasets against the strong adaptive white-box attacks in both visual classification and detection tasks. In both tasks, FNS significantly outperforms previous defending methods on adversarial robustness with a relatively low influence on the performance of benign images. Code is available at https://github.com/jschenthu/FNS.

Defending Against Universal Adversarial Patches by Clipping Feature Norms

Improving Adversarial Robustness Against Universal Patch Attacks Through Feature Norm Suppressing

An Adversarial Attack Via Feature Contributive Regions

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

Improving the Robustness of Deep Convolutional Neural Networks Through Feature Learning

Investigating and unmasking feature-level vulnerabilities of CNNs to adversarial perturbations

Patch-Wise Attack for Fooling Deep Neural Network

LAFIT: Efficient and Reliable Evaluation of Adversarial Defenses With Latent Features

PatchGuard++: Efficient Provable Attack Detection against Adversarial Patches

Detecting Adversarial Examples In Deep Neural Networks Using Normalizing Filters

Defending Adversarial Patches via Joint Region Localizing and Inpainting

Defending Against Universal Attacks Through Selective Feature Regeneration

Improving Adversarial Robustness via Feature Pattern Consistency Constraint

PatchZero: Defending against Adversarial Patch Attacks by Detecting and Zeroing the Patch

Defending Against Universal Patch Attacks by Restricting Token Attention in Vision Transformers

D2Defend: Dual-Domain based Defense against Adversarial Examples

Adversarial scratches: Deployable attacks to CNN classifiers

I Don't Know You, But I Can Catch You: Real-Time Defense against Diverse Adversarial Patches for Object Detectors

Cross-shaped Adversarial Patch Attack

Certified Defense Against Patch Attacks Via Mask-Guided Randomized Smoothing