Abstract:Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model complexity, to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Then we generalize a complexity-related variable, which is sensitive to the changes in model width and the trade-off factors in adversarial training. Moreover, intensive empirical evidence validates that this variable highly correlates with the generalization gap of Cross-Entropy loss between adversarial-trained and standard-trained models, especially during the initial and final phases of the training process. Building upon this observation, we propose a novel regularization framework, called Logit-Oriented Adversarial Training (LOAT), which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms, including PGD-AT, TRADES, TRADES (LSE), MART, and DM-AT, across various network architectures. Our code will be available at
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: **How to alleviate the problem of the decline in standard generalization performance in adversarial training**. Specifically, from the perspective of model complexity, the authors use the Fisher - Rao norm, a geometric invariant, to measure model complexity and propose a new regularization framework - Logit - Oriented Adversarial Training (LOAT), in order to improve the trade - off between the standard accuracy and robustness of the adversarial training model without significantly increasing the computational cost.
### Specific Problem Description
1. **Decline in Generalization Performance in Adversarial Training**:
- Although adversarial training can improve the robustness of the model against adversarial samples, it often reduces the generalization performance of the model on standard data, that is, there is a "trade - off between robustness and accuracy".
- The specific manifestation of this phenomenon is that after adversarial training, the accuracy of the model on the clean test set will decrease to some extent.
2. **Limitations of Existing Methods**:
- Previous studies have explained this phenomenon from different perspectives, such as bias introduction, insufficient data volume, local Lipschitz property, etc., but most of these studies have focused on a single factor and failed to provide a unified theoretical explanation.
### Solutions in the Paper
1. **Introduction of the Fisher - Rao Norm**:
- The Fisher - Rao norm is a geometrically invariant complexity measure, which is suitable for multi - layer perceptron (MLP) models.
- The author establishes the upper and lower bounds of the Rademacher complexity of the cross - entropy loss through the Fisher - Rao norm and discovers a variable Γce related to the model width and the adversarial training trade - off factor, which is closely related to the generalization gap.
2. **Proposing Logit - Oriented Adversarial Training (LOAT)**:
- LOAT combines two regularization strategies: standard logit - oriented regularization and adaptive adversarial logit - pairing strategy.
- Adjust the regularization direction at the beginning and end of training respectively to effectively alleviate the trade - off problem between robustness and accuracy while minimizing the computational cost.
### Main Contributions
- **Theoretical Analysis**: Through the Fisher - Rao norm, the author provides the upper and lower bounds of the Rademacher complexity of the cross - entropy loss and reveals the relationship between the variable Γce and the generalization gap.
- **Experimental Verification**: Extensive experiments show that LOAT can significantly improve the performance of existing adversarial training algorithms (such as PGD - AT, TRADES, etc.), with very little increase in computational cost.
Through these methods, the author has successfully provided a unified and effective solution to the problem of the decline in generalization performance in adversarial training.