Abstract:In recent works, robust networks have consistently exhibited more discriminative saliency map that proves to indicate sufficient adversarial robustness. In existed safe training paradigms e.g., adversarial training, however, the progressive saliency information regarding on what input semantic feature model prediction relies, have not yet been fully-explored. Due to this, we consider the incorporation of posterior saliency properties of robust model in training, as an efficient supervision signal on robust learning. It thus provides an alternative direction to enhance robustness, from the saliency interpretability perspective. In this article, to harden model we propose to optimize the discrimination of intermediate gradient-based saliency and maintain its consensus in training, which encourage model to behave according to task-relevant feature from the salient region such as object edges in image. Then, we introduce Adversarially Gradient-based Saliency Consensus Training method, dubbed Adv-GSCT. Within it, we preserve the similarity between the learned model saliency and the target one as label, approximated in the most offending case representing the least but essential information scenario. Meanwhile, a constructed pseudo-input coupled with feature importance, is feed into model to ensure the discrimination of estimated target saliency. Besides providing a novel insight into adversarial defense, Adv-GSCT differs from the current most effective adversarial training and does not need multiple iterative generations of adversarial perturbation whose computational cost and sensitivity direction of prediction concern. Finally, extensive performance evaluations on MNIST, CIFAR-10 and ImageNet datasets demonstrate the superiority of our proposed method.

Towards Better Robust Generalization with Shift Consistency Regularization

Feature Augmentation for Adversarial Robustness

Certifying Better Robust Generalization for Unsupervised Domain Adaptation

Inter-feature Relationship Certifies Robust Generalization of Adversarial Training

Perturbation diversity certificates robust generalization

Learning Representations Robust to Group Shifts and Adversarial Examples

Robustness, Privacy, and Generalization of Adversarial Training

Robust Local Features for Improving the Generalization of Adversarial Training

Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training

Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Latent Feature Relation Consistency for Adversarial Robustness

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

Generalizability of Adversarial Robustness Under Distribution Shifts

Stability Analysis and Generalization Bounds of Adversarial Training

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

Improving Model Robustness with Latent Distribution Locally and Globally

Consistency Regularization Helps Mitigate Robust Overfitting in Adversarial Training.

To be Robust or to be Fair: Towards Fairness in Adversarial Training

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Towards Gradient-Based Saliency Consensus Training for Adversarial Robustness