Abstract:Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model complexity, to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Then we generalize a complexity-related variable, which is sensitive to the changes in model width and the trade-off factors in adversarial training. Moreover, intensive empirical evidence validates that this variable highly correlates with the generalization gap of Cross-Entropy loss between adversarial-trained and standard-trained models, especially during the initial and final phases of the training process. Building upon this observation, we propose a novel regularization framework, called Logit-Oriented Adversarial Training (LOAT), which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms, including PGD-AT, TRADES, TRADES (LSE), MART, and DM-AT, across various network architectures. Our code will be available at

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: **How to alleviate the problem of the decline in standard generalization performance in adversarial training**. Specifically, from the perspective of model complexity, the authors use the Fisher - Rao norm, a geometric invariant, to measure model complexity and propose a new regularization framework - Logit - Oriented Adversarial Training (LOAT), in order to improve the trade - off between the standard accuracy and robustness of the adversarial training model without significantly increasing the computational cost. ### Specific Problem Description 1. **Decline in Generalization Performance in Adversarial Training**: - Although adversarial training can improve the robustness of the model against adversarial samples, it often reduces the generalization performance of the model on standard data, that is, there is a "trade - off between robustness and accuracy". - The specific manifestation of this phenomenon is that after adversarial training, the accuracy of the model on the clean test set will decrease to some extent. 2. **Limitations of Existing Methods**: - Previous studies have explained this phenomenon from different perspectives, such as bias introduction, insufficient data volume, local Lipschitz property, etc., but most of these studies have focused on a single factor and failed to provide a unified theoretical explanation. ### Solutions in the Paper 1. **Introduction of the Fisher - Rao Norm**: - The Fisher - Rao norm is a geometrically invariant complexity measure, which is suitable for multi - layer perceptron (MLP) models. - The author establishes the upper and lower bounds of the Rademacher complexity of the cross - entropy loss through the Fisher - Rao norm and discovers a variable Γce related to the model width and the adversarial training trade - off factor, which is closely related to the generalization gap. 2. **Proposing Logit - Oriented Adversarial Training (LOAT)**: - LOAT combines two regularization strategies: standard logit - oriented regularization and adaptive adversarial logit - pairing strategy. - Adjust the regularization direction at the beginning and end of training respectively to effectively alleviate the trade - off problem between robustness and accuracy while minimizing the computational cost. ### Main Contributions - **Theoretical Analysis**: Through the Fisher - Rao norm, the author provides the upper and lower bounds of the Rademacher complexity of the cross - entropy loss and reveals the relationship between the variable Γce and the generalization gap. - **Experimental Verification**: Extensive experiments show that LOAT can significantly improve the performance of existing adversarial training algorithms (such as PGD - AT, TRADES, etc.), with very little increase in computational cost. Through these methods, the author has successfully provided a unified and effective solution to the problem of the decline in generalization performance in adversarial training.

Boosting Adversarial Training via Fisher-Rao Norm-based Regularization

GAAT: Group Adaptive Adversarial Training to Improve the Trade-Off Between Robustness and Accuracy

Feature Augmentation for Adversarial Robustness

GEAR: A Margin-based Federated Adversarial Training Approach

Regularization for Adversarial Robust Learning

Regularization properties of adversarially-trained linear regression

Attention-based investigation and solution to the trade-off issue of adversarial training

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Understanding Robust Overfitting of Adversarial Training and Beyond

Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

RDAT: an efficient regularized decoupled adversarial training mechanism

Robust Single-step Adversarial Training with Regularizer

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing

RAMP: Boosting Adversarial Robustness Against Multiple $l_p$ Perturbations for Universal Robustness

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

$\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training

On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Stability Analysis and Generalization Bounds of Adversarial Training