Abstract:While adversarial training methods have resulted in significant improvements in the deep neural nets' robustness against norm-bounded adversarial perturbations, their generalization performance from training samples to test data has been shown to be considerably worse than standard empirical risk minimization methods. Several recent studies seek to connect the generalization behavior of adversarially trained classifiers to various gradient-based min-max optimization algorithms used for their training. In this work, we study the generalization performance of adversarial training methods using the algorithmic stability framework. Specifically, our goal is to compare the generalization performance of the vanilla adversarial training scheme fully optimizing the perturbations at every iteration vs. the free adversarial training simultaneously optimizing the norm-bounded perturbations and classifier parameters. Our proven generalization bounds indicate that the free adversarial training method could enjoy a lower generalization gap between training and test samples due to the simultaneous nature of its min-max optimization algorithm. We perform several numerical experiments to evaluate the generalization performance of vanilla, fast, and free adversarial training methods. Our empirical findings also show the improved generalization performance of the free adversarial training method and further demonstrate that the better generalization result could translate to greater robustness against black-box attack schemes. The code is available at

What problem does this paper attempt to address?

This paper aims to solve the problem that while the Adversarial Training (AT) method improves the robustness of deep neural networks against adversarial perturbations, its generalization performance drops significantly. Specifically, the paper focuses on how to analyze the generalization performance of different adversarial training methods through the algorithmic stability framework, and proposes a method called "Free Adversarial Training" (Free AT). Compared with the traditional "Vanilla Adversarial Training" (Vanilla AT), Free AT can optimize model parameters and adversarial perturbations simultaneously in each iteration, thus potentially reducing the generalization gap between the training set and the test set. ### Main contributions of the paper: 1. **Theoretical analysis**: The generalization behavior of the Free AT algorithm is analyzed using the algorithmic stability framework, and theoretical generalization error bounds are provided. 2. **Experimental verification**: The generalization performances of Vanilla AT, Fast AT, and Free AT on standard computer vision datasets are compared through numerical experiments. The results show that the Free AT method is superior to Vanilla AT and Fast AT in generalization performance. 3. **Robustness under black - box attacks**: Experimental results show that the model trained by Free AT has higher test accuracy when facing black - box attacks. 4. **Free - TRADES**: Free - TRADES is proposed, which combines the idea of Free AT on the basis of the TRADES algorithm and improves the generalization performance by optimizing minimization and maximization variables simultaneously. ### Key technical points: - **Adversarial training**: By introducing adversarial samples in the training process, the robustness of the model against adversarial attacks is improved. - **Algorithmic stability**: The generalization ability of the model, especially the generalization performance of the adversarial training method, is analyzed using the algorithmic stability framework. - **Free AT**: In each iteration, the model parameters and adversarial perturbations are updated simultaneously to reduce the generalization gap. - **TRADES**: An improved adversarial training method that balances the accuracy and robustness of the model through an alternative loss function. ### Experimental results: - **Generalization gap**: The generalization gap between the training set and the test set of the Free AT method is significantly smaller than that of Vanilla AT and Fast AT. - **Black - box attacks**: The model trained by Free AT shows better robustness when facing black - box attacks. - **Different numbers of training samples**: As the number of training samples increases, the generalization gap of Free AT decreases more rapidly, verifying the correctness of the theoretical analysis. ### Conclusion: Through theoretical analysis and experimental verification, the paper shows the advantages of the Free AT method in improving the generalization performance of the adversarial training model, especially in reducing the generalization gap between the training set and the test set. In addition, the proposal of Free - TRADES further proves the effectiveness of optimizing minimization and maximization variables simultaneously. These research results provide new ideas and methods for improving the generalization ability and robustness of the adversarial training model.

Stability and Generalization in Free Adversarial Training

Stability Analysis and Generalization Bounds of Adversarial Training

Data-Dependent Stability Analysis of Adversarial Training

Robustness, Privacy, and Generalization of Adversarial Training

Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization

Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Stability and Generalization of Stochastic Gradient Methods for Minimax Problems

Adversarial Training with Anti-adversaries

Train simultaneously, generalize better: Stability of gradient-based minimax learners

Uniformly Stable Algorithms for Adversarial Training and Beyond

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

Towards Robust Training of Neural Networks by Regularizing Adversarial Gradients

Enhancing Adversarial Robustness through Stable Adversarial Training

Generalist: Decoupling Natural and Robust Generalization

Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

Combining Adversaries with Anti-adversaries in Training

A General Retraining Framework for Scalable Adversarial Classification

Fundamental Tradeoffs in Distributionally Adversarial Training

Towards Understanding Fast Adversarial Training

Can Stability be Detrimental? Better Generalization through Gradient Descent Instabilities

Towards A Unified Min-Max Framework for Adversarial Exploration and Robustness