Abstract:State-of-the-art adversarial attacks are aimed at neural network classifiers. By default, neural networks use gradient descent to minimize their loss function. The gradient of a classifier's loss function is used by gradient-based adversarial attacks to generate adversarially perturbed images. We pose the question whether another type of optimization could give neural network classifiers an edge. Here, we introduce a novel approach that uses minimax optimization to foil gradient-based adversarial attacks. Our minimax classifier is the discriminator of a generative adversarial network (GAN) that plays a minimax game with the GAN generator. In addition, our GAN generator projects all points onto a manifold that is different from the original manifold since the original manifold might be the cause of adversarial attacks. To measure the performance of our minimax defense, we use adversarial attacks - Carlini Wagner (CW), DeepFool, Fast Gradient Sign Method (FGSM) - on three datasets: MNIST, CIFAR-10 and German Traffic Sign (TRAFFIC). Against CW attacks, our minimax defense achieves 98.07% (MNIST-default 98.93%), 73.90% (CIFAR-10-default 83.14%) and 94.54% (TRAFFIC-default 96.97%). Against DeepFool attacks, our minimax defense achieves 98.87% (MNIST), 76.61% (CIFAR-10) and 94.57% (TRAFFIC). Against FGSM attacks, we achieve 97.01% (MNIST), 76.79% (CIFAR-10) and 81.41% (TRAFFIC). Our Minimax adversarial approach presents a significant shift in defense strategy for neural network classifiers.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to defend against the effectiveness of gradient - based adversarial attacks on neural network classifiers**. Specifically, most of the current state - of - the - art adversarial attack methods rely on the gradient of the loss function of the neural network classifier to generate adversarial samples. These attack methods include Carlini Wagner (CW), DeepFool and Fast Gradient Sign Method (FGSM), etc. The author proposes a new method, by introducing the minimax optimization strategy and using the game mechanism in the Generative Adversarial Network (GAN) to resist these gradient - based attacks.
### Specific description of the problem
1. **Existing problems**:
- Existing adversarial attack methods mainly rely on the gradient information of neural networks, for example, by calculating the gradient of the loss function to generate adversarial samples.
- These attack methods can successfully mislead neural network classifiers, causing them to misclassify adversarial samples.
- Current defense methods, such as adversarial training, although they can improve the robustness of the model to a certain extent, are usually only effective against specific types of attacks and are easily bypassed by new attack methods.
2. **Research objectives**:
- Propose a new defense strategy that can effectively resist gradient - based adversarial attacks.
- By changing the optimization method, make the neural network classifier no longer rely on the traditional gradient - descent optimization method, thereby reducing the success rate of adversarial attacks.
- Explore the use of minimax optimization and the GAN framework to reshape the data manifold, making it difficult for adversarial samples to be effective on the new manifold.
### Main contributions of the paper
1. **Proposed a new Minimax defense method**: This method realizes the effective defense against adversarial attacks in the GAN framework through the minimax optimization strategy.
2. **Identified and targeted gradient - based attacks**: The paper points out that all current adversarial attacks are gradient - based, and the Minimax defense is designed based on this.
3. **Achieved an accuracy rate comparable to that of non - adversarial samples**: The experimental results on three datasets, MNIST, CIFAR - 10 and German Traffic Sign (TRAFFIC), show that the Minimax defense still maintains a high classification accuracy when facing multiple adversarial attacks.
4. **First use of the GAN's minimax method for defense**: This is the first attempt to use the GAN's minimax optimization strategy to resist adversarial attacks.
### Experimental results
The paper shows the performance of the Minimax defense method on multiple datasets:
- **MNIST dataset**:
- For the FGSM attack, the accuracy rate of the Minimax defense is 97.01%, close to 98.93% of the default classifier.
- For the CW L2 attack, the accuracy rate of the Minimax defense is 98.07%, significantly higher than 0.84% without defense.
- **CIFAR - 10 dataset**:
- For the FGSM attack, the accuracy rate of the Minimax defense is 76.79%, close to 83.14% of the default classifier.
- For the CW L2 attack, the accuracy rate of the Minimax defense is 73.90%, significantly higher than 8.73% without defense.
- **TRAFFIC dataset**:
- For the FGSM attack, the accuracy rate of the Minimax defense is 81.41%, close to 96.97% of the default classifier.
- For the CW L2 attack, the accuracy rate of the Minimax defense is 94.54%, significantly higher than 1.41% without defense.
### Conclusion
This paper proposes a novel defense method by introducing the minimax optimization and GAN framework, which can effectively resist gradient - based adversarial attacks on multiple datasets while maintaining a high classification accuracy. This method provides new ideas for future research on adversarial attacks and defenses.