Abstract:Recent advances in deep neural network (DNN) techniques have increased the importance of security and robustness of algorithms where DNNs are applied. However, several studies have demonstrated that neural networks are vulnerable to adversarial examples, which are generated by adding crafted adversarial noises to the input images. Because the adversarial noises are typically imperceptible to the human eye, it is difficult to defend DNNs. One method of defense is the detection of adversarial examples by analyzing characteristics of input images. Recent studies have used the hidden layer outputs of the target classifier to improve the robustness but need to access the target classifier. Moreover, there is no post-processing step for the detected adversarial examples. They simply discard the detected adversarial images. To resolve this problem, we propose a novel detection-based method, which predicts the adversarial noise and detects the adversarial example based on the predicted noise without any target classification information. We first generated adversarial examples and adversarial noises, which can be obtained from the residual between the original and adversarial example images. Subsequently, we trained the proposed adversarial noise predictor to estimate the adversarial noise image and trained the adversarial detector using the input images and the predicted noises. The proposed framework has the advantage that it is agnostic to the input image modality. Moreover, the predicted noises can be used to reconstruct the detected adversarial examples as the non-adversarial images instead of discarding the detected adversarial examples. We tested our proposed method against the fast gradient sign method (FGSM), basic iterative method (BIM), projected gradient descent (PGD), Deepfool, and Carlini & Wagner adversarial attack methods on the CIFAR-10 and CIFAR-100 datasets provided by the Canadian Institute for Advanced Research (CIFAR). Our method demonstrated significant improvements in detection accuracy when compared to the state-of-the-art methods and resolved the wastage problem of the detected adversarial examples. The proposed method agnostic to the input image modality demonstrated that the noise predictor successfully captured noise in the Fourier domain and improved the performance of the detection task. Moreover, we resolved the post-processing problem of the detected adversarial examples with the reconstruction process using the predicted noise.

Joint contrastive learning and frequency domain defense against adversarial examples

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

An Adversarial Attack Via Feature Contributive Regions

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Feature decoupling and interaction network for defending against adversarial examples

LDN-RC: a Lightweight Denoising Network with Residual Connection to Improve Adversarial Robustness

DiFNet: Densely High-Frequency Convolutional Neural Networks

Defense against adversarial attacks on deep convolutional neural networks through nonlocal denoising

Improving the Robustness of Deep Convolutional Neural Networks Through Feature Learning

FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks

Adversarial example detection by predicting adversarial noise in the frequency domain

Invisible Adversarial Attack Against Deep Neural Networks: an Adaptive Penalization Approach

Frequency-based methods for improving the imperceptibility and transferability of adversarial examples

Defense Against Adversarial Attacks with Efficient Frequency-Adaptive Compression and Reconstruction

D2Defend: Dual-Domain based Defense against Adversarial Examples

CSFAdv: Critical Semantic Fusion Guided Least-Effort Adversarial Example Attacks

A hybrid adversarial training for deep learning model and denoising network resistant to adversarial examples

Adversarial perturbation denoising utilizing common characteristics in deep feature space

Mitigating Gradient-based Adversarial Attacks via Denoising and Compression

D3R-Net: Denoising Diffusion-Based Defense Restore Network for Adversarial Defense in Remote Sensing Scene Classification

Defense against adversarial attacks based on color space transformation