Abstract:As a new programming paradigm, deep learning (DL) has achieved impressive performance in areas such as image processing and speech recognition, and has expanded its application to solve many real-world problems. However, neural networks and DL are normally black-box systems; even worse, DL-based software are vulnerable to threats from abnormal examples, such as adversarial and backdoored examples constructed by attackers with malicious intentions as well as unintentionally mislabeled samples. Therefore, it is important and urgent to detect such abnormal examples. Although various detection approaches have been proposed respectively addressing some specific types of abnormal examples, they suffer from some limitations; until today, this problem is still of considerable interest. In this work, we first propose a novel characterization to distinguish abnormal examples from normal ones based on the observation that abnormal examples have significantly different (adversarial) robustness from normal ones. We systemically analyze those three different types of abnormal samples in terms of robustness and find that they have different characteristics from normal ones. As robustness measurement is computationally expensive and hence can be challenging to scale to large networks, we then propose to effectively and efficiently measure robustness of an input sample using the cost of adversarially attacking the input, which was originally proposed to test robustness of neural networks against adversarial examples. Next, we propose a novel detection method, named attack as detection (A 2 D for short), which uses the cost of adversarially attacking an input instead of robustness to check if it is abnormal. Our detection method is generic, and various adversarial attack methods could be leveraged. Extensive experiments show that A 2 D is more effective than recent promising approaches that were proposed to detect only one specific type of abnormal examples. We also thoroughly discuss possible adaptive attack methods to our adversarial example detection method and show that A 2 D is still effective in defending carefully designed adaptive adversarial attack methods—for example, the attack success rate drops to 0% on CIFAR10.

AFLF: a defensive framework to defeat multi-faceted adversarial attacks via attention feature fusion

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Attack As Detection: Using Adversarial Attack Methods to Detect Abnormal Examples.

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

A Framework for Robust Deep Learning Models Against Adversarial Attacks Based on a Protection Layer Approach

Improving the Robustness of Deep Convolutional Neural Networks Through Feature Learning

Feature decoupling and interaction network for defending against adversarial examples

DeepFense: Online Accelerated Defense Against Adversarial Deep Learning

Improving Adversarial Robustness via Decoupled Visual Representation Masking

A defensive framework for deepfake detection under adversarial settings using temporal and spatial features

DAFAR: Defending against Adversaries by Feedback-Autoencoder Reconstruction

A Simple Framework to Enhance the Adversarial Robustness of Deep Learning-based Intrusion Detection System

Designing defensive techniques to handle adversarial attack on deep learning based model

How to Defend and Secure Deep Learning Models Against Adversarial Attacks in Computer Vision: A Systematic Review

DeepFeature: Guiding adversarial testing for deep neural network systems using robust features

Adaptive Feature Alignment for Adversarial Training

An ADMM-Based Universal Framework for Adversarial Attacks on Deep Neural Networks

From Attack to Defense: Insights into Deep Learning Security Measures in Black-Box Settings

Adversarial robustness improvement for deep neural networks

MixDefense: A Defense-in-Depth Framework for Adversarial Example Detection Based on Statistical and Semantic Analysis

UnMask: Adversarial Detection and Defense Through Robust Feature Alignment