Abstract:Y Recent studies demonstrated that deep neural networks (DNNs) are vulnerable to adversarial examples, which would seriously threaten security-sensitive applications. Existing works synthesized the adversarial examples by perturbing the original/benign images by leveraging the L-p-norm to penalize the perturbations, which restricts the pixel-wise distance between the adversarial images and correspondingly benign images. However, they added perturbations globally to the benign images without explicitly considering their content/spacial structure, resulting in noticeable artifacts especially in those originally clean regions, e.g., sky and smooth surface. In this paper, we propose an invisible adversarial attack, which synthesizes adversarial examples that are visually indistinguishable from benign ones. We adaptively distribute the perturbation according to human sensitivity to a local stimulus in the benign image, i.e., the higher insensitivity, the more perturbation. Two types of adaptive adversarial attacks are proposed: 1) coarse-grained and 2) fine-grained. The former conducts L-p-norm regularized by the novel spatial constraints, which utilizes the rich information of the cluttered regions to mask perturbation. The latter, called Just Noticeable Distortion (JND)-based adversarial attack, utilizes the proposed JND(p) metric for better measuring the perceptual similarity, and adaptively sets penalty by weighting the pixel-wise perceptual redundancy of an image. We conduct extensive experiments on the MNIST, CIFAR-10 and ImageNet datasets and a comprehensive user study with 50 participants. The experimental results demonstrate that JND(p) is a better metric for measuring the perceptual similarity than L-p-norm, and the proposed adaptive adversarial attacks can synthesize indistinguishable adversarial examples from benign ones and outperform the state-of-the-art methods.

RIA: A Reversible Network-based Imperceptible Adversarial Attack

RA-RevGAN: Region-Aware Reversible Adversarial Example Generation Network for Privacy-Preserving Applications

Towards Imperceptible and Robust Adversarial Example Attacks Against Neural Networks

IWA: Integrated Gradient based White-box Attacks for Fooling Deep Neural Networks

Perception-Driven Imperceptible Adversarial Attack Against Decision-Based Black-Box Models

Imperceptible Adversarial Attack via Invertible Neural Networks

Improving Model Robustness Against Adversarial Examples with Redundant Fully Connected Layer.

Sensitive Region-Aware Black-Box Adversarial Attacks.

Generating Imperceptible and Cross-Resolution Remote Sensing Adversarial Examples Based on Implicit Neural Representations

Improving the Transferability of Adversarial Examples with a Noise Data Enhancement Framework and Random Erasing

Restricted Region Based Iterative Gradient Method for Non-Targeted Attack

Invisible Adversarial Attack Against Deep Neural Networks: an Adaptive Penalization Approach

Demiguise Attack: Crafting Invisible Semantic Adversarial Perturbations with Perceptual Similarity

DI-AA: an Interpretable White-box Attack for Fooling Deep Neural Networks

Patch-Wise Attack for Fooling Deep Neural Network

Adversarial Attack? Don't Panic

Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction

Enhanced countering adversarial attacks via input denoising and feature restoring

Generating Imperceptible Adversarial Patch Based on Vulnerable Targeted Attack

Intermediate-Layer Transferable Adversarial Attack With DNN Attention

Improving the Invisibility of Adversarial Examples with Perceptually Adaptive Perturbation