Abstract:By injecting adversarial examples into training data, adversarial training is promising for improving the robustness of deep learning models. However, most existing adversarial training approaches are based on a specific type of adversarial attack. It may not provide sufficiently representative samples from the adversarial domain, leading to a weak generalization ability on adversarial examples from other attacks. Moreover, during the adversarial training, adversarial perturbations on inputs are usually crafted by fast single-step adversaries so as to scale to large datasets. This work is mainly focused on the adversarial training yet efficient FGSM adversary. In this scenario, it is difficult to train a model with great generalization due to the lack of representative adversarial samples, aka the samples are unable to accurately reflect the adversarial domain. To alleviate this problem, we propose a novel Adversarial Training with Domain Adaptation (ATDA) method. Our intuition is to regard the adversarial training on FGSM adversary as a domain adaption task with limited number of target domain samples. The main idea is to learn a representation that is semantically meaningful and domain invariant on the clean domain as well as the adversarial domain. Empirical evaluations on Fashion-MNIST, SVHN, CIFAR-10 and CIFAR-100 demonstrate that ATDA can greatly improve the generalization of adversarial training and the smoothness of the learned models, and outperforms state-of-the-art methods on standard benchmark datasets. To show the transfer ability of our method, we also extend ATDA to the adversarial training on iterative attacks such as PGD-Adversial Training (PAT) and the defense performance is improved considerably.

Adversarial Training using Contrastive Divergence

An Adversarial Attack Via Feature Contributive Regions

MAE-MACD: the Masked Adversarial Contrastive Distillation Algorithm Grounded in Masked Autoencoders

Adversarial Training of Deep Neural Networks Guided by Texture and Structural Information

Adv-BDPM: Adversarial Attack Based on Boundary Diffusion Probability Model.

CDTA: A Cross-Domain Transfer-Based Attack with Contrastive Learning.

LADDER: Latent boundary-guided adversarial training

Using Single-Step Adversarial Training to Defend Iterative Adversarial Examples

Feature Distillation With Guided Adversarial Contrastive Learning

Towards Adversarial Robust Representation Through Adversarial Contrastive Decoupling

Adversarial Training: embedding adversarial perturbations into the parameter space of a neural network to build a robust system

Dual Head Adversarial Training.

An efficient adversarial example generation algorithm based on an accelerated gradient iterative fast gradient

Improving the Generalization of Adversarial Training with Domain Adaptation

AdCo: Adversarial Contrast for Efficient Learning of Unsupervised Representations from Self-Trained Negative Adversaries

A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning

Improving Transferability of Adversarial Examples With Input Diversity

Ensemble Adversarial Training: Attacks and Defenses

Adversarial Training: A Survey

Adversarial Distributional Training for Robust Deep Learning

The Enemy of My Enemy is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training