Abstract:Adversarial training (AT) constructs robust neural networks by incorporating adversarial perturbations into natural data. However, it is plagued by the issue of robust overfitting (RO), which severely damages the model's robustness. In this paper, we investigate RO from a novel feature generalization perspective. Specifically, we design factor ablation experiments to assess the respective impacts of natural data and adversarial perturbations on RO, identifying that the inducing factor of RO stems from natural data. Given that the only difference between adversarial and natural training lies in the inclusion of adversarial perturbations, we further hypothesize that adversarial perturbations degrade the generalization of features in natural data and verify this hypothesis through extensive experiments. Based on these findings, we provide a holistic view of RO from the feature generalization perspective and explain various empirical behaviors associated with RO. To examine our feature generalization perspective, we devise two representative methods, attack strength and data augmentation, to prevent the feature generalization degradation during AT. Extensive experiments conducted on benchmark datasets demonstrate that the proposed methods can effectively mitigate RO and enhance adversarial robustness.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the **Robust Overfitting (RO) problem in Adversarial Training (AT)**. Specifically, the author focuses on the phenomenon that during the adversarial training process, the robustness of the model decreases as the training progresses. This phenomenon seriously impairs the robustness of the model and is prevalent in different datasets, network architectures, and AT variants. ### Main contributions of the paper: 1. **Identify the causes of RO**: Through factor ablation experiments, the author found that the causes of RO mainly come from natural data, rather than adversarial perturbations. 2. **Propose a hypothesis and verify it**: The author further hypothesizes that adversarial perturbations will reduce the generalization ability of features in natural data, and verifies this hypothesis through a large number of experiments. 3. **Provide a comprehensive understanding from the perspective of feature generalization**: Based on the above findings, the author provides a comprehensive understanding of RO from the perspective of feature generalization and explains various empirical behaviors related to RO. 4. **Propose two methods to alleviate RO**: To verify the understanding of feature generalization, the author proposes two representative methods - attack strength and data augmentation, and verifies the effectiveness of these methods through experiments. ### Key concepts and formulas: - **Objective function of natural training**: \[ \min_{\theta} \frac{1}{n} \sum_{i = 1}^n \ell(f_\theta(x_i), y_i) \] where \( f_\theta \) is a network with parameter \( \theta \), \( x_i \) is an input sample, \( y_i \) is the corresponding label, and \( \ell \) is a loss function. - **Objective function of adversarial training**: \[ \min_{\theta} \frac{1}{n} \sum_{i = 1}^n \max_{\delta_i \in \Delta} \ell(f_\theta(x_i+\delta_i), y_i) \] where \( \delta_i \) is an adversarial perturbation, constrained by a predefined budget \( \Delta \), that is, \( \|\delta_i\|_p\leq\epsilon \). - **Adversarial perturbation generation**: \[ \delta_k=\Pi_\Delta(\alpha\cdot\text{sign}(\nabla_x \ell(f_\theta(x + \delta_{k - 1}), y))+\delta_{k - 1}) \] where \( \Pi_\Delta \) is a projection operator to ensure that the perturbation is within the budget. ### Experimental results: 1. **Factor ablation experiment**: By removing natural data and adversarial perturbations, it is found that serious RO still exists when only adversarial perturbations are removed, while RO is significantly alleviated when both are removed simultaneously, indicating that natural data is the main cause of RO. 2. **Attack strength and data augmentation experiment**: By adjusting the strength of adversarial perturbations and applying data augmentation techniques, it is verified that these methods can effectively prevent the degradation of feature generalization, thereby alleviating the RO phenomenon. In summary, this paper, through in - depth analysis of the robust overfitting problem in adversarial training, proposes new understandings and solutions, providing theoretical basis and technical means for improving the robustness of adversarial training models.

Understanding Robust Overfitting from the Feature Generalization Perspective

Feature Augmentation for Adversarial Robustness

Understanding Robust Overfitting of Adversarial Training and Beyond

Understanding and Mitigating Robust Overfitting through the Lens of Feature Dynamics

Towards Understanding Clean Generalization and Robust Overfitting in Adversarial Training

Adversarial Masking: Towards Understanding Robustness Trade-off for Generalization

Over-parameterization and Adversarial Robustness in Neural Networks: An Overview and Empirical Analysis

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Robustness, Privacy, and Generalization of Adversarial Training

The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression

ROBY: Evaluating the adversarial robustness of a deep model by its decision boundaries

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Overfitting in adversarially robust deep learning

Exploring Robust Features for Improving Adversarial Robustness

Inter-feature Relationship Certifies Robust Generalization of Adversarial Training

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Adversarial Training on Purification (AToP): Advancing Both Robustness and Generalization

Exploring the Relationship Between Architectural Design and Adversarially Robust Generalization

Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement