Abstract:We present a scalable, black box, perception-in-the-loop technique to find adversarial examples for deep neural network classifiers. Black box means that our procedure only has input-output access to the classifier, and not to the internal structure, parameters, or intermediate confidence values. Perception-in-the-loop means that the notion of proximity between inputs can be directly queried from human participants rather than an arbitrarily chosen metric. Our technique is based on covariance matrix adaptation evolution strategy (CMA-ES), a black box optimization approach. CMA-ES explores the search space iteratively in a black box manner, by generating populations of candidates according to a distribution, choosing the best candidates according to a cost function, and updating the posterior distribution to favor the best candidates. We run CMA-ES using human participants to provide the fitness function, using the insight that the choice of best candidates in CMA-ES can be naturally modeled as a perception task: pick the top $k$ inputs perceptually closest to a fixed input. We empirically demonstrate that finding adversarial examples is feasible using small populations and few iterations. We compare the performance of CMA-ES on the MNIST benchmark with other black-box approaches using $L_p$ norms as a cost function, and show that it performs favorably both in terms of success in finding adversarial examples and in minimizing the distance between the original and the adversarial input. In experiments on the MNIST, CIFAR10, and GTSRB benchmarks, we demonstrate that CMA-ES can find perceptually similar adversarial inputs with a small number of iterations and small population sizes when using perception-in-the-loop. Finally, we show that networks trained specifically to be robust against $L_\infty$ norm can still be susceptible to perceptually similar adversarial examples.

Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans

The Human Visual System and Adversarial AI

Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World

An Extended Study of Human-like Behavior under Adversarial Training

Using an ensemble color space model to tackle adversarial examples

On Inherent Adversarial Robustness of Active Vision Systems

Understanding Deep Learning defenses Against Adversarial Examples Through Visualizations for Dynamic Risk Assessment

Perception-in-the-Loop Adversarial Examples

Overcoming Adversarial Attacks for Human-in-the-Loop Applications

Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement

Strengthening Robustness Under Adversarial Attacks Using Brain Visual Codes

Unadversarial Examples: Designing Objects for Robust Vision

The Artificial Mind's Eye: Resisting Adversarials for Convolutional Neural Networks using Internal Projection

Adversarial Attacks Hidden in Plain Sight

Designing defensive techniques to handle adversarial attack on deep learning based model

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

Adversarial images for the primate brain

Towards Robustness against Unsuspicious Adversarial Examples

Defense against adversarial attacks based on color space transformation

Brain-inspired reverse adversarial examples.