Abstract:Y Recent studies demonstrated that deep neural networks (DNNs) are vulnerable to adversarial examples, which would seriously threaten security-sensitive applications. Existing works synthesized the adversarial examples by perturbing the original/benign images by leveraging the L-p-norm to penalize the perturbations, which restricts the pixel-wise distance between the adversarial images and correspondingly benign images. However, they added perturbations globally to the benign images without explicitly considering their content/spacial structure, resulting in noticeable artifacts especially in those originally clean regions, e.g., sky and smooth surface. In this paper, we propose an invisible adversarial attack, which synthesizes adversarial examples that are visually indistinguishable from benign ones. We adaptively distribute the perturbation according to human sensitivity to a local stimulus in the benign image, i.e., the higher insensitivity, the more perturbation. Two types of adaptive adversarial attacks are proposed: 1) coarse-grained and 2) fine-grained. The former conducts L-p-norm regularized by the novel spatial constraints, which utilizes the rich information of the cluttered regions to mask perturbation. The latter, called Just Noticeable Distortion (JND)-based adversarial attack, utilizes the proposed JND(p) metric for better measuring the perceptual similarity, and adaptively sets penalty by weighting the pixel-wise perceptual redundancy of an image. We conduct extensive experiments on the MNIST, CIFAR-10 and ImageNet datasets and a comprehensive user study with 50 participants. The experimental results demonstrate that JND(p) is a better metric for measuring the perceptual similarity than L-p-norm, and the proposed adaptive adversarial attacks can synthesize indistinguishable adversarial examples from benign ones and outperform the state-of-the-art methods.

Patch-Wise Attack for Fooling Deep Neural Network

Fooling Neural Network Interpretations - Adversarial Noise to Attack Images.

Patch-wise++ Perturbation for Adversarial Targeted Attacks

Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction

Generating Imperceptible Adversarial Patch Based on Vulnerable Targeted Attack

GreedyFool: Distortion-Aware Sparse Adversarial Attack

Attention Based Adversarial Attacks with Low Perturbations

Invisible Adversarial Attack Against Deep Neural Networks: an Adaptive Penalization Approach

PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks

Pixel Logo Attack: Embedding Attacks As Logo-Like Pixels

An Effective Way to Boost Black-Box Adversarial Attack.

Inconspicuous Adversarial Patches for Fooling Image Recognition Systems on Mobile Devices

Towards Imperceptible and Robust Adversarial Example Attacks Against Neural Networks

TransNoise: Transferable Universal Adversarial Noise for Adversarial Attack

Analysis and Countermeasure Design on Adversarial Patch Attacks

Generating Visually Realistic Adversarial Patch

An Evolutionary-Based Black-Box Attack to Deep Neural Network Classifiers.

Query-Efficient Decision-Based Black-Box Patch Attack