Abstract:Adversarial attacks on deep-learning models pose a serious threat to their reliability and security. Existing defense mechanisms are narrow addressing a specific type of attack or being vulnerable to sophisticated attacks. We propose a new defense mechanism that, while being focused on image-based classifiers, is general with respect to the cited category. It is rooted on hyperspace projection. In particular, our solution provides a pseudo-random projection of the original dataset into a new dataset. The proposed defense mechanism creates a set of diverse projected datasets, where each projected dataset is used to train a specific classifier, resulting in different trained classifiers with different decision boundaries. During testing, it randomly selects a classifier to test the input. Our approach does not sacrifice accuracy over legitimate input. Other than detailing and providing a thorough characterization of our defense mechanism, we also provide a proof of concept of using four optimization-based adversarial attacks (PGD, FGSM, IGSM, and C\&W) and a generative adversarial attack testing them on the MNIST dataset. Our experimental results show that our solution increases the robustness of deep learning models against adversarial attacks and significantly reduces the attack success rate by at least 89% for optimization attacks and 78% for generative attacks. We also analyze the relationship between the number of used hyperspaces and the efficacy of the defense mechanism. As expected, the two are positively correlated, offering an easy-to-tune parameter to enforce the desired level of security. The generality and scalability of our solution and adaptability to different attack scenarios, combined with the excellent achieved results, other than providing a robust defense against adversarial attacks on deep learning networks, also lay the groundwork for future research in the field.

Adversarial Attacks, Regression, and Numerical Stability Regularization

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Adversarial Attacks on Regression Systems Via Gradient Optimization

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing Their Input Gradients

Regularization for Adversarial Robust Learning

Local Competition and Uncertainty for Adversarial Robustness in Deep Learning

Overparameterized Linear Regression under Adversarial Attacks

Regularization properties of adversarially-trained linear regression

Not So Robust After All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Evaluating Model Robustness Using Adaptive Sparse L0 Regularization

Minimax rates of convergence for nonparametric regression under adversarial attacks

Adversarial robustness improvement for deep neural networks

Singular Regularization with Information Bottleneck Improves Model's Adversarial Robustness

How adversarial attacks can disrupt seemingly stable accurate classifiers

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Adversarial Attacks Neutralization via Data Set Randomization

Stability Analysis and Generalization Bounds of Adversarial Training

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients

Jacobian Adversarially Regularized Networks for Robustness

Adversarial Robustness of Neural Networks From the Perspective of Lipschitz Calculus: A Survey