Abstract:Adversarial attacks on deep-learning models pose a serious threat to their reliability and security. Existing defense mechanisms are narrow addressing a specific type of attack or being vulnerable to sophisticated attacks. We propose a new defense mechanism that, while being focused on image-based classifiers, is general with respect to the cited category. It is rooted on hyperspace projection. In particular, our solution provides a pseudo-random projection of the original dataset into a new dataset. The proposed defense mechanism creates a set of diverse projected datasets, where each projected dataset is used to train a specific classifier, resulting in different trained classifiers with different decision boundaries. During testing, it randomly selects a classifier to test the input. Our approach does not sacrifice accuracy over legitimate input. Other than detailing and providing a thorough characterization of our defense mechanism, we also provide a proof of concept of using four optimization-based adversarial attacks (PGD, FGSM, IGSM, and C\&W) and a generative adversarial attack testing them on the MNIST dataset. Our experimental results show that our solution increases the robustness of deep learning models against adversarial attacks and significantly reduces the attack success rate by at least 89% for optimization attacks and 78% for generative attacks. We also analyze the relationship between the number of used hyperspaces and the efficacy of the defense mechanism. As expected, the two are positively correlated, offering an easy-to-tune parameter to enforce the desired level of security. The generality and scalability of our solution and adaptability to different attack scenarios, combined with the excellent achieved results, other than providing a robust defense against adversarial attacks on deep learning networks, also lay the groundwork for future research in the field.

An effective deep learning adversarial defense method based on spatial structural constraints in embedding space

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Improving Adversarial Robustness of 3D Point Cloud Classification Models

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Spatial-Frequency Discriminability for Revealing Adversarial Perturbations

Adversarial Examples Detection Beyond Image Space.

Defense against adversarial attacks based on color space transformation

Mitigating Adversarial Attacks for Deep Neural Networks by Input Deformation and Augmentation

Latent Adversarial Defence with Boundary-guided Generation

D2Defend: Dual-Domain based Defense against Adversarial Examples

Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality

Detecting Adversarial Samples for Deep Learning Models: A Comparative Study

Defending against adversarial attacks using spherical sampling-based variational auto-encoder

Structured Adversarial Attack: Towards General Implementation and Better Interpretability

Adversarial Examples Detection Based on Error Level Analysis and Space Mapping

Detection of Adversarial Attacks via Disentangling Natural Images and Perturbations

General Adversarial Defense via Pixel Level and Feature Level Distribution Alignment

Adversarial Attacks Neutralization via Data Set Randomization

Using an ensemble color space model to tackle adversarial examples

Beyond Empirical Risk Minimization: Local Structure Preserving Regularization for Improving Adversarial Robustness

Adversarial perturbation denoising utilizing common characteristics in deep feature space