Abstract:Deep Neural Networks (DNN) has achieved a great success in many tasks in recent years. However, researchers found that DNN is vulnerable to adversarial examples that are maliciously perturbed inputs. The elaborately designed adversarial perturbations can easily confuse the model whereas have no impacts on human perception. To counter adversarial examples, we propose an integrated detection framework for detecting adversarial examples, which involves statistical detector and Gaussian noise injection detector. The statistical detector extracts Subtractive Pixel Adjacency Matrix (SPAM) and uses the second order Markov transition probability matrix to model SPAM so as to highlight the statistical anomaly hidden in an adversarial input. Then an ensemble classifier using SPAM based feature is applied to detect the adversarial input containing large perturbation. The Gaussian noise injection detector first injects an additive Gaussian noise into the input, and then feeds both the original input and its Gaussian noise injected counterpart into a targeted network. By comparing the two outputs difference, the detector is applied to detect adversarial input containing small perturbation: if the difference exceeds a threshold, the input is adversarial; otherwise legitimate. The two detectors are adaptive to different characteristics of adversarial perturbation so that the proposed detection framework is capable of detecting multiple types of adversarial examples. In our work, we test six categories of adversarial examples produced by Fast Gradient Sign Method (FGSM, untargeted), Randomized Fast Gradient Sign Method (R-FGSM, untargeted), Basic Iterative Method (BIM, untargeted), DeepFool (untargeted), Carlini&Wagner Method (CW_UT, untargeted) and CW_T(targeted). Comprehensive empirical results show that the proposed detection framework has achieved a promising performance on ImageNet database.

Adaptive Image Adversarial Example Detection Based on Class Activation Mapping.

An Adversarial Attack Via Feature Contributive Regions

Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction

Defense Against Adversarial Attacks via Adversarial Noise Denoising Networks in Image Recognition

Integration of Statistical Detector and Gaussian Noise Injection Detector for Adversarial Example Detection in Deep Neural Networks

GGCAD: A Novel Method of Adversarial Detection by Guided Grad-CAM

Model-agnostic Adversarial Example Detection via High-Frequency Amplification

ADDITION: Detecting Adversarial Examples with Image-Dependent Noise Reduction

AEGuard: Image Feature-Based Independent Adversarial Example Detection Model

Adversarial Examples Detection Based on Error Level Analysis and Space Mapping

CAMA: Class Activation Mapping Disruptive Attack for Deep Neural Networks

Adversarial Examples Detection with Enhanced Image Difference Features based on Local Histogram Equalization

Enhancing Generalization in Few-Shot Learning for Detecting Unknown Adversarial Examples

An efficient adversarial example generation algorithm based on an accelerated gradient iterative fast gradient

Adversarial Feature Genome: a Data Driven Adversarial Examples Recognition Method.

Adversarial Detection Based on Inner-Class Adjusted Cosine Similarity

Research on an Adaptive Neural Network K-Pixel Adversarial Example Generation Algorithm.

Detecting Adversarial Examples Via Prediction Difference for Deep Neural Networks.

A Data-driven Adversarial Examples Recognition Framework Via Adversarial Feature Genome

Detecting and Classifying Adversarial Examples Based on DCT Transform.

Adversarial example detection based on saliency map features