Abstract:Recently, it was found that deep neural networks (DNNs) are susceptible to adversarial input perturbations. Most defense strategies adopt the denoising method based on preprocessing, which mitigates the impacts of adversarial perturbations on DNNs by learning the distributions of nonadversarial datasets and projecting adversarial inputs into the learned nonadversarial manifolds. However, existing defense strategies commonly focus on reconstructing clean images while ignoring the role of adversarial perturbations, which results in the reconstructed images failing to achieve the visual quality and classification accuracy of the original clean images, and the induced adversarial robustness improvement is limited. This paper proposes a feature decoupling-interaction network (FDIN), which introduces the concepts of clean features and adversarial features to separate the two kinds of features from the input adversarial examples (AEs) in a feature decoupling-interaction manner. The clean features are used to reconstruct the input image so that it is infinitely close to the original clean image, and the adversarial features are used to reconstruct the adversarial perturbations. Adversarial perturbations are removed from the adversarial examples across multiple cross cycles to improve further the reconstructed image's visual quality and classification accuracy. The features of the original clean image are used as prior knowledge to guide the network to learn the clean features of the adversarial examples and improve the classification accuracy of the model on the clean examples. In addition, a classification loss function based on the Carlini & Wagner (CW) attack algorithm is used instead of the conventional cross-entropy loss function to improve the adversarial robustness of the FDIN. The experimental results show that the proposed method achieves better defense performance than the current state-of-the-art methods on both standard tests and various attack tests and even exceeds the test accuracy of the target classifier on the original test set.

Adversarial perturbation denoising utilizing common characteristics in deep feature space

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Image denoising algorithm based on adversarial learning using joint loss function

An Adversarial Attack Via Feature Contributive Regions

Feature decoupling and interaction network for defending against adversarial examples

LDN-RC: a Lightweight Denoising Network with Residual Connection to Improve Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Detect and defense against adversarial examples in deep learning using natural scene statistics and adaptive denoising

Detecting adversarial samples by noise injection and denoising

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser

On the Limitations of Denoising Strategies as Adversarial Defenses

Defense against adversarial attacks on deep convolutional neural networks through nonlocal denoising

Pasadena: Perceptually Aware and Stealthy Adversarial Denoise Attack

Unsupervised Adversarial Perturbation Eliminating Via Disentangled Representations.

Adversarial Examples Detection Beyond Image Space.

Defense against adversarial attacks based on color space transformation

Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial Attack

Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling

Improving Adversarial Robustness via Decoupled Visual Representation Masking

D3R-Net: Denoising Diffusion-Based Defense Restore Network for Adversarial Defense in Remote Sensing Scene Classification