Abstract:Adversarial training (AT) refers to integrating adversarial examples -- inputs altered with imperceptible perturbations that can significantly impact model predictions -- into the training process. Recent studies have demonstrated the effectiveness of AT in improving the robustness of deep neural networks against diverse adversarial attacks. However, a comprehensive overview of these developments is still missing. This survey addresses this gap by reviewing a broad range of recent and representative studies. Specifically, we first describe the implementation procedures and practical applications of AT, followed by a comprehensive review of AT techniques from three perspectives: data enhancement, network design, and training configurations. Lastly, we discuss common challenges in AT and propose several promising directions for future research.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **A review of the roles and methods of Adversarial Training (AT) in improving the robustness of deep neural networks against various adversarial attacks**. Specifically, the paper aims to fill the gap in the existing literature regarding a comprehensive overview of adversarial training techniques and provide a comprehensive perspective by reviewing a wide range of the latest research.
### Specific description of the problem
1. **Impact of adversarial samples**:
Adversarial samples are samples generated by making small and imperceptible perturbations to the input data, and these perturbations can significantly affect the prediction results of the model. For example, after a small perturbation is made to an image, it may cause a classification model to misclassify it into a completely different category (as shown in Figure 1).
2. **Effectiveness of adversarial training**:
Recent research has shown that adversarial training can effectively improve the robustness of deep neural networks against multiple adversarial attacks. However, there is currently a lack of a comprehensive review of these developments, making it difficult for researchers to systematically understand and apply these techniques.
3. **Deficiencies in existing literature**:
Although adversarial training has been widely used in multiple fields (such as medical image segmentation, autonomous driving, anomaly detection, etc.), the existing literature fails to provide a comprehensive framework to summarize and classify these techniques.
### Goals of the paper
To弥补 this deficiency, the paper conducts a comprehensive review of adversarial training in the following aspects:
- **Implementation process and practical applications**: Describe the specific implementation steps of adversarial training and its applications in practice.
- **Technical classification**: Classify and comment on adversarial training techniques in detail from three perspectives (data augmentation, network design, training configuration).
- **Challenges and future directions**: Discuss common challenges in adversarial training and propose potential directions for future research.
### Formula representation
Adversarial training is usually formulated as a min - max optimization problem:
\[
\min_{\theta} \mathbb{E}_{(x,y) \sim D} \left[ \max_{\delta \in B(x,\epsilon)} \ell(x+\delta, y; \theta) \right]
\]
where:
- $\theta$ represents model parameters,
- $(x, y)$ represents training data sampled from the data distribution $D$,
- $\ell(x+\delta, y; \theta)$ represents the loss value calculated using the adversarial sample $x + \delta$ and its true label $y$,
- $\delta$ represents the adversarial perturbation, which is a small perturbation that is imperceptible to humans but can significantly degrade the model performance,
- $B(x, \epsilon)$ is the set of allowed perturbations, defined as:
\[
B(x, \epsilon) = \{\delta | x+\delta \in [0,1], \|\delta\|_p \leq \epsilon\}
\]
where $\epsilon$ is the maximum perturbation magnitude, and $\|\delta\|_p$ quantifies the perturbation size using the $p$-norm, and all pixels are normalized to the range $[0,1]$.
### Conclusion
By systematically summarizing and classifying adversarial training techniques, this paper provides a comprehensive reference framework for researchers and points out potential directions for future research. This helps promote the development and application of adversarial training techniques, especially in improving the robustness and security of deep learning models.