Abstract:Universal adversarial patch attacks, which are readily implemented, have been validated to be able to fool real-world deep convolutional neural networks (CNNs), posing a serious threat to practical computer vision systems based on CNNs. Unfortunately, current defending approaches are severely understudied facing the following problems. Patch detection-based methods suffer from dramatic performance drops against white-box or adaptive attacks since they rely heavily on empirical clues. Methods based on adversarial training or certified defense are difficult to be scaled up to large-scale datasets or complex practical networks due to prohibitively high computational overhead or over strong assumptions on the network structure. In this article, we focus on two cases of widely adopted universal adversarial patch attacks, namely the universal targeted attack on image classifiers and the universal vanishing attack on object detectors. We find that, for popular CNNs, the attacking success of the adversarial patch relies on feature vectors centered at the patch location with large norm in classifiers and large channel-aware norm (CA-Norm) in detectors, and further present a mathematical explanation for this phenomenon. Based on this, we propose a simple but effective defending method using the feature norm suppressing (FNS) layer, which can renormalize the feature norm by nonincreasing functions. As a differentiable module, FNS can be adaptively inserted in various CNN architectures to achieve multistage suppression of the generation of large norm feature vectors. Moreover, FNS is efficient with no trainable parameters and very low computational overhead. We evaluate our proposed defending method across multiple CNN architectures and datasets against the strong adaptive white-box attacks in both visual classification and detection tasks. In both tasks, FNS significantly outperforms previous defending methods on adversarial robustness with a relatively low influence on the performance of benign images. Code is available at https://github.com/jschenthu/FNS.

Adaptive Feature Alignment for Adversarial Training

Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training

An Adversarial Attack Via Feature Contributive Regions

Feature Augmentation for Adversarial Robustness

Selective Domain-Invariant Feature Alignment Network for Face Anti-Spoofing.

Trust-aware Conditional Adversarial Domain Adaptation with Feature Norm Alignment.

Improving Adversarial Robustness of 3D Point Cloud Classification Models

Adversarial Feature Stacking for Accurate and Robust Predictions.

Improving Adversarial Robustness via Feature Pattern Consistency Constraint

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Improving the Robustness of Deep Convolutional Neural Networks Through Feature Learning

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness

Towards Both Accurate and Robust Neural Networks Without Extra Data

CAFA: Class-Aware Feature Alignment for Test-Time Adaptation

Adversarial Feature Augmentation and Normalization for Visual Recognition

Improving Adversarial Robustness Against Universal Patch Attacks Through Feature Norm Suppressing

Attack-Agnostic Adversarial Detection

Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

Learning More Robust Features with Adversarial Training

General Adversarial Defense via Pixel Level and Feature Level Distribution Alignment