Abstract:The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original inputs, raises concerns about the robustness of DNNs to such attacks. Adversarial training, which is the main heuristic method for improving adversarial robustness and the first line of defense against adversarial attacks, requires many sample-by-sample calculations to increase training size and is usually insufficiently strong for an entire network. This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class. From this perspective, we propose a method to generate a single but image-agnostic adversarial perturbation that carries the semantic information implying the directions to the fragile parts on the decision boundary and causes inputs to be misclassified as a specified target. We call the adversarial training based on such perturbations "region adversarial training" (RAT), which resembles classical adversarial training but is distinguished in that it reinforces the semantic information missing in the relevant regions. Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even when a very small dataset from the training data is used; moreover, it can defend against fast gradient sign method, universal perturbation, projected gradient descent, and Carlini and Wagner adversarial attacks, which have a completely different pattern from those encountered by the model during retraining. (C) 2021 Elsevier B.V. All rights reserved.

Adaptive Retraining for Neural Network Robustness in Classification

Improving Model Robustness Against Adversarial Examples with Redundant Fully Connected Layer.

Improving Adversarial Robustness Requires Revisiting Misclassified Examples.

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

An adversarial defense algorithm based on robust U-net

Towards robust neural networks via a global and monotonically decreasing robustness training strategy

ROBUSTNESS OF DEEP NEURAL NETWORKS TO ADVERSARIAL EXAMPLES

Robust Adversarial Attacks on Imperfect Deep Neural Networks in Fault Classification

A constrained optimization approach to improve robustness of neural networks

Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization

Towards Class-wise Robustness Analysis

Improving the Accuracy-Robustness Trade-Off of Classifiers via Adaptive Smoothing

An Empirical Study on the Effect of Training Data Perturbations on Neural Network Robustness

An Orthogonal Classifier for Improving the Adversarial Robustness of Neural Networks

A Survey of Neural Network Robustness Assessment in Image Recognition

Improving adversarial robustness of deep neural networks by using semantic information

Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space.

Towards Certified Probabilistic Robustness with High Accuracy

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Interpreting and Evaluating Neural Network Robustness