Abstract:Adversarial attacks on deep-learning models pose a serious threat to their reliability and security. Existing defense mechanisms are narrow addressing a specific type of attack or being vulnerable to sophisticated attacks. We propose a new defense mechanism that, while being focused on image-based classifiers, is general with respect to the cited category. It is rooted on hyperspace projection. In particular, our solution provides a pseudo-random projection of the original dataset into a new dataset. The proposed defense mechanism creates a set of diverse projected datasets, where each projected dataset is used to train a specific classifier, resulting in different trained classifiers with different decision boundaries. During testing, it randomly selects a classifier to test the input. Our approach does not sacrifice accuracy over legitimate input. Other than detailing and providing a thorough characterization of our defense mechanism, we also provide a proof of concept of using four optimization-based adversarial attacks (PGD, FGSM, IGSM, and C\&W) and a generative adversarial attack testing them on the MNIST dataset. Our experimental results show that our solution increases the robustness of deep learning models against adversarial attacks and significantly reduces the attack success rate by at least 89% for optimization attacks and 78% for generative attacks. We also analyze the relationship between the number of used hyperspaces and the efficacy of the defense mechanism. As expected, the two are positively correlated, offering an easy-to-tune parameter to enforce the desired level of security. The generality and scalability of our solution and adaptability to different attack scenarios, combined with the excellent achieved results, other than providing a robust defense against adversarial attacks on deep learning networks, also lay the groundwork for future research in the field.

Multi-attacks: Many images $+$ the same adversarial attack $\to$ many target labels

F&F Attack: Adversarial Attack Against Multiple Object Trackers by Inducing False Negatives and False Positives

Clean-image Backdoor: Attacking Multi-label Models with Poisoned Labels Only

Showing Many Labels in Multi-label Classification Models: An Empirical Study of Adversarial Examples

MALT Powers Up Adversarial Attacks

A Universal Targeted Attack Method against Image Classification

Decision-based Universal Adversarial Attack

Pick-Object-Attack: Type-specific adversarial attack for object detection

Sparse vs Contiguous Adversarial Pixel Perturbations in Multimodal Models: An Empirical Analysis

Keep on Swimming: Real Attackers Only Need Partial Knowledge of a Multi-Model System

Comprehensive Comparisons of Gradient-Based Multi-Label Adversarial Attacks

Mutual-modality Adversarial Attack with Semantic Perturbation

When Measures are Unreliable: Imperceptible Adversarial Perturbations toward Top-$k$ Multi-Label Learning

How adversarial attacks can disrupt seemingly stable accurate classifiers

2N labeling defense method against adversarial attacks by filtering and extended class label set

Robust Superpixel-Guided Attentional Adversarial Attack

Adversarial Training and Robustness for Multiple Perturbations

One Noise to Rule Them All: Multi-View Adversarial Attacks with Universal Perturbation

Attacking Adversarial Attacks as A Defense

Adversarial Attacks Neutralization via Data Set Randomization

Learning to Attack with Fewer Pixels: A Probabilistic Post-hoc Framework for Refining Arbitrary Dense Adversarial Attacks