Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

Guang Lin,Chao Li,Jianhai Zhang,Toshihisa Tanaka,Qibin Zhao

2024-08-23

Abstract:The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel pipeline to acquire the robust purifier model, named Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks, resulting in the robustness generalization to unseen attacks, and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves optimal robustness and exhibits generalization ability against unseen attacks.

Computer Vision and Pattern Recognition,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to maintain the generalization ability against unknown attacks while improving the robustness against known attacks, and without sacrificing the standard accuracy on clean samples. Specifically, although existing Adversarial Training (AT) methods can achieve the best robustness against specific attacks, they cannot generalize well to unseen attacks; while methods based on Adversarial Purification (AP) can enhance the generalization ability, but have poor robustness when facing known attacks. In addition, both of these methods have the problem of decreased standard accuracy, which limits their applicability in practical applications. To solve these problems, the author proposes a new defense technique - Adversarial Training on Purification (AToP), which combines the advantages of AT and AP and is achieved through two main components: 1. **Perturbation Destruction**: Use Random Transforms (RT) to destroy the perturbation structure in adversarial samples, thereby achieving effective defense against unknown attacks. 2. **Fine - tuning Purifier Model**: Use Adversarial Loss to fine - tune the purifier model to generate high - quality purified samples and further improve robustness. Through this method, the author aims to obtain a robust purification model that can effectively defend against known attacks, generalize to unknown attacks, and maintain high accuracy on clean samples. Experimental results show that this method has achieved state - of - the - art performance in multiple datasets and attack scenarios.

Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

Robust Diffusion Models for Adversarial Purification

Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM

Improve Adversarial Robustness Via Probabilistic Distributions Decoupled Network While Guaranteeing Clean Performance

An adversarial defense algorithm based on robust U-net

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff

Improving Adversarial Robustness via Attention and Adversarial Logit Pairing

ATRA: Efficient Adversarial Training with High-Robust Area

New Paradigm of Adversarial Training: Breaking Inherent Trade-Off between Accuracy and Robustness via Dummy Classes

Randomized Purifier Based on Low Adversarial Transferability for Adversarial Defense

Adversarial Training of Deep Neural Networks Guided by Texture and Structural Information

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing

Defense Against Adversarial Attacks Using Topology Aligning Adversarial Training

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Weighted Adaptive Perturbations Adversarial Training for Improving Robustness

Provable Unrestricted Adversarial Training without Compromise with Generalizability

Attacking Adversarial Attacks as A Defense

OTAD: An Optimal Transport-Induced Robust Model for Agnostic Adversarial Attack

FePN: A Robust Feature Purification Network to Defend Against Adversarial Examples.

Learn from the Past: A Proxy based Adversarial Defense Framework to Boost Robustness