Abstract:Adversarial attacks on deep-learning models pose a serious threat to their reliability and security. Existing defense mechanisms are narrow addressing a specific type of attack or being vulnerable to sophisticated attacks. We propose a new defense mechanism that, while being focused on image-based classifiers, is general with respect to the cited category. It is rooted on hyperspace projection. In particular, our solution provides a pseudo-random projection of the original dataset into a new dataset. The proposed defense mechanism creates a set of diverse projected datasets, where each projected dataset is used to train a specific classifier, resulting in different trained classifiers with different decision boundaries. During testing, it randomly selects a classifier to test the input. Our approach does not sacrifice accuracy over legitimate input. Other than detailing and providing a thorough characterization of our defense mechanism, we also provide a proof of concept of using four optimization-based adversarial attacks (PGD, FGSM, IGSM, and C\&W) and a generative adversarial attack testing them on the MNIST dataset. Our experimental results show that our solution increases the robustness of deep learning models against adversarial attacks and significantly reduces the attack success rate by at least 89% for optimization attacks and 78% for generative attacks. We also analyze the relationship between the number of used hyperspaces and the efficacy of the defense mechanism. As expected, the two are positively correlated, offering an easy-to-tune parameter to enforce the desired level of security. The generality and scalability of our solution and adaptability to different attack scenarios, combined with the excellent achieved results, other than providing a robust defense against adversarial attacks on deep learning networks, also lay the groundwork for future research in the field.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the reliability and security issues of deep neural network models in the face of adversarial attacks. Specifically, existing defense mechanisms are often narrowly targeted, can only deal with specific types of attacks, or are prone to failure against complex attack methods. Therefore, the paper proposes a new defense mechanism, aiming to enhance the resistance of image classifiers to adversarial attacks through dataset randomization without sacrificing the accuracy of normal inputs. ### Core Problems of the Paper 1. **Threat of Adversarial Attacks**: Adversarial attacks cause deep - learning models to make incorrect predictions by making small but carefully designed perturbations to input samples, thereby threatening the reliability and security of the models. 2. **Limitations of Existing Defense Mechanisms**: Existing defense methods such as adversarial training, gradient masking, distillation techniques, input transformation, and adversarial sample detection can alleviate adversarial attacks to a certain extent, but are often ineffective against stronger attacks and are easily bypassed. ### Solutions The paper proposes a dataset randomization method based on high - dimensional space projection. The specific steps are as follows: 1. **High - Dimensional Space Projection**: Use randomly generated images to project the original dataset into a new high - dimensional space. Each random image generates a projected dataset. 2. **Multi - Classifier Training**: Use different projected datasets to train multiple classifiers, and each classifier has a different decision boundary. 3. **Random Selection in the Testing Phase**: In the testing phase, randomly select a projected image and its corresponding trained classifier to make predictions on the input. ### Main Contributions 1. **Propose a New Defense Mechanism**: Use random images for high - dimensional space projection to enhance the robustness of classifiers. 2. **Maintain High Classification Accuracy**: While improving the defense ability against adversarial attacks, it does not sacrifice the classification accuracy of normal inputs. 3. **No Need for Additional Defense Networks**: It can be easily implemented on existing classifiers with low computational resource requirements. 4. **Fully Adjustable**: The effectiveness of the defense mechanism can be controlled by adjusting the number of random images. 5. **Experimental Verification**: Experiments were carried out on the MNIST dataset, and the results show that this method has a significant defense effect against both optimization - based and generation - based adversarial attacks. The attack success rates were reduced by at least 89% and 78% respectively. ### Experimental Results - **Optimization - Based Adversarial Attacks**: Such as PGD, FGSM, IGSM, and C&W, the attack success rate was reduced to less than 10%. - **Generation - Based Adversarial Attacks**: When more than 4 random images are used, the attack success rate is also reduced to below the random classification rate. - **Influence of the Number of Random Images**: The number of random images is positively correlated with the effectiveness of the defense mechanism. Increasing the number of random images can further improve the defense effect. ### Conclusion The dataset randomization method based on high - dimensional space projection proposed in this paper not only provides an effective defense strategy against adversarial attacks in theory, but also performs well in practical applications, laying the foundation for future research.

Adversarial Attacks Neutralization via Data Set Randomization

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

A Universal Defense Strategy Against Adversarial Attacks Based on Attention-Guided

Exploiting vulnerabilities of deep neural networks for privacy protection

Towards Deep Learning Models Resistant to Adversarial Attacks

An Empirical Investigation of Randomized Defenses against Adversarial Attacks

Designing defensive techniques to handle adversarial attack on deep learning based model

Defense Against Adversarial Attacks Using Image Label and Pixel Guided Sparse Denoiser

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Defense against adversarial attacks based on color space transformation

Defense Against Adversarial Images Using Web-Scale Nearest-Neighbor Search

Defending Against Physically Realizable Attacks on Image Classification

Using an ensemble color space model to tackle adversarial examples

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks

Defense against adversarial attacks on deep convolutional neural networks through nonlocal denoising

Distributionally Adversarial Attack

Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW

GAN-based Classifier Protection Against Adversarial Attacks

Towards the first adversarially robust neural network model on MNIST