Abstract:As a new programming paradigm, deep learning (DL) has achieved impressive performance in areas such as image processing and speech recognition, and has expanded its application to solve many real-world problems. However, neural networks and DL are normally black-box systems; even worse, DL-based software are vulnerable to threats from abnormal examples, such as adversarial and backdoored examples constructed by attackers with malicious intentions as well as unintentionally mislabeled samples. Therefore, it is important and urgent to detect such abnormal examples. Although various detection approaches have been proposed respectively addressing some specific types of abnormal examples, they suffer from some limitations; until today, this problem is still of considerable interest. In this work, we first propose a novel characterization to distinguish abnormal examples from normal ones based on the observation that abnormal examples have significantly different (adversarial) robustness from normal ones. We systemically analyze those three different types of abnormal samples in terms of robustness and find that they have different characteristics from normal ones. As robustness measurement is computationally expensive and hence can be challenging to scale to large networks, we then propose to effectively and efficiently measure robustness of an input sample using the cost of adversarially attacking the input, which was originally proposed to test robustness of neural networks against adversarial examples. Next, we propose a novel detection method, named attack as detection (A 2 D for short), which uses the cost of adversarially attacking an input instead of robustness to check if it is abnormal. Our detection method is generic, and various adversarial attack methods could be leveraged. Extensive experiments show that A 2 D is more effective than recent promising approaches that were proposed to detect only one specific type of abnormal examples. We also thoroughly discuss possible adaptive attack methods to our adversarial example detection method and show that A 2 D is still effective in defending carefully designed adaptive adversarial attack methods—for example, the attack success rate drops to 0% on CIFAR10.

Adversarial Detection Based on Inner-Class Adjusted Cosine Similarity

Attack As Detection: Using Adversarial Attack Methods to Detect Abnormal Examples.

Adversarial Detection Based on Local Cosine Similarity

Adversarial Examples Detection with Enhanced Image Difference Features based on Local Histogram Equalization

FCGSM: Fast Conjugate Gradient Sign Method for Adversarial Attack on Image Classification

Research on Adversarial Sample Detection Method Based on Image Similarity

Integration of Statistical Detector and Gaussian Noise Injection Detector for Adversarial Example Detection in Deep Neural Networks

Feature Decoupling Based Adversarial Examples Detection Method for Remote Sensing Scene Classification

A Fast Adversarial Sample Detection Approach for Industrial Internet-of-Things Applications.

Detecting Adversarial Image Examples in Deep Neural Networks with Adaptive Noise Reduction

Learning to Characterize Adversarial Subspaces.

Internal Wasserstein Distance for Adversarial Attack and Defense

Detecting Adversarial Samples for Deep Learning Models: A Comparative Study

Detecting Adversarial Examples Via Prediction Difference for Deep Neural Networks.

An Interpretive Adversarial Attack Method: Attacking Softmax Gradient Layer-Wise Relevance Propagation Based on Cosine Similarity Constraint and TS-Invariant

ADS-detector: An attention-based dual stream adversarial example detection method

GGCAD: A Novel Method of Adversarial Detection by Guided Grad-CAM

Learning to Detect Adversarial Examples Based on Class Scores

Detecting Adversarial Examples from Sensitivity Inconsistency of Spatial-Transform Domain

Invisible Adversarial Attack Against Deep Neural Networks: an Adaptive Penalization Approach

Model-agnostic Adversarial Example Detection via High-Frequency Amplification