Abstract:Adversarial attacks have been demonstrated a huge threat to the field of artificial intelligence security. To address it, adversarial training is proposed, but it requires a high computation cost and will degrade the original performance. For easy deployment, some detection solutions have emerged. However, most existing methods mainly leverage the vulnerability of adversarial perturbation against input-level processing and differences in distribution such as reconstruction or bit reduction, but it is difficult for them to detect attacks with different perturbation patterns and strengths. To address it, a very recent method, ContraNet, leverages the victim model’s prediction to guide the input reconstruction, but this will decrease the detection rate on clean images. This paper also focuses on input-reconstruction but utilizes semantic similarity comparison. Different from ContraNet, we attempt to amplify the semantic inconsistency between the adversarial example and its reconstruction format based on the intrinsic features of the target classifier. In other words, this paper proposes a reconstruction-based detection via intrinsic features of the classifier to explore adversarial examples from the perspective of the target classifier rather than the distribution in pixel space perceived by humans. Based on this, a feature extractor can be learned in unsupervised learning through a modified SimCLR to perform semantic extraction, while the reconstruction of adversarial examples has inconsistent semantic information on the pixels, thus distinguishing clean samples and AEs. Our method can effectively defend against various disturbances and different types of attacks, maintaining a high detection robustness accuracy, and a high clean detection rate.

Defense Against Adversarial Attacks by Reconstructing Images

An Adversarial Attack Via Feature Contributive Regions

Adversarial example defense based on image reconstruction

Improving Model Robustness Against Adversarial Examples with Redundant Fully Connected Layer.

Image Super-Resolution as a Defense Against Adversarial Attacks

Defense against adversarial attacks based on color space transformation

D2Defend: Dual-Domain based Defense against Adversarial Examples

Detecting Adversarial Examples Via Reconstruction-based Semantic Inconsistency

Defending Against Adversarial Examples Using Perceptual Image Hashing

RANDOM MASK: Towards Robust Convolutional Neural Networks

Defense against adversarial attacks by low‐level image transformations

Defective Convolutional Networks

Reversible Attack based on Local Visual Adversarial Perturbation

Adversarial Attacks Hidden in Plain Sight

Attacking Adversarial Attacks as A Defense

Adversarial Perturbations Prevail in the Y-Channel of the YCbCr Color Space

Delving into Deep Image Prior for Adversarial Defense: A Novel Reconstruction-based Defense Framework

Defending against adversarial attacks using spherical sampling-based variational auto-encoder

Are You Confident That You Have Successfully Generated Adversarial Examples?

Architectural Resilience to Foreground-and-Background Adversarial Noise

Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder