Improving Adversarial Robustness Against Universal Patch Attacks Through Feature Norm Suppressing

Cheng Yu,Jiansheng Chen,Yu Wang,Youze Xue,Huimin Ma
DOI: https://doi.org/10.1109/TNNLS.2023.3326871
2023-11-02
Abstract:Universal adversarial patch attacks, which are readily implemented, have been validated to be able to fool real-world deep convolutional neural networks (CNNs), posing a serious threat to practical computer vision systems based on CNNs. Unfortunately, current defending approaches are severely understudied facing the following problems. Patch detection-based methods suffer from dramatic performance drops against white-box or adaptive attacks since they rely heavily on empirical clues. Methods based on adversarial training or certified defense are difficult to be scaled up to large-scale datasets or complex practical networks due to prohibitively high computational overhead or over strong assumptions on the network structure. In this article, we focus on two cases of widely adopted universal adversarial patch attacks, namely the universal targeted attack on image classifiers and the universal vanishing attack on object detectors. We find that, for popular CNNs, the attacking success of the adversarial patch relies on feature vectors centered at the patch location with large norm in classifiers and large channel-aware norm (CA-Norm) in detectors, and further present a mathematical explanation for this phenomenon. Based on this, we propose a simple but effective defending method using the feature norm suppressing (FNS) layer, which can renormalize the feature norm by nonincreasing functions. As a differentiable module, FNS can be adaptively inserted in various CNN architectures to achieve multistage suppression of the generation of large norm feature vectors. Moreover, FNS is efficient with no trainable parameters and very low computational overhead. We evaluate our proposed defending method across multiple CNN architectures and datasets against the strong adaptive white-box attacks in both visual classification and detection tasks. In both tasks, FNS significantly outperforms previous defending methods on adversarial robustness with a relatively low influence on the performance of benign images. Code is available at https://github.com/jschenthu/FNS.
What problem does this paper attempt to address?