Abstract:Deep neural network models are vulnerable to attacks from adversarial methods, such as gradient attacks. Evening small perturbations can cause significant differences in their predictions. Adversarial training (AT) aims to improve the model's adversarial robustness against gradient attacks by generating adversarial samples and optimizing the adversarial training objective function of the model. Existing methods mainly focus on improving robust accuracy, balancing natural and robust accuracy and suppressing robust overfitting. They rarely consider the AT problem from the characteristics of deep neural networks themselves, such as the stability properties under certain conditions. From a mathematical perspective, deep neural networks with stable training processes may have a better ability to suppress overfitting, as their training process is smoother and avoids sudden drops in performance. We provide a proof of the existence of Ulam stability for deep neural networks. Ulam stability not only determines the existence of the solution for an operator inequality, but it also provides an error bound between the exact and approximate solutions. The feature subspace of a deep neural network with Ulam stability can be accurately characterized and constrained by a function with special properties and a controlled error boundary constant. This restricted feature subspace leads to a more stable training process. Based on these properties, we propose an adversarial training framework called Ulam stability adversarial training (US-AT). This framework can incorporate different Ulam stability conditions and benchmark AT models, optimize the construction of the optimal feature subspace, and consistently improve the model's robustness and training stability. US-AT is simple and easy to use, and it can be easily integrated with existing multi-class AT models, such as GradAlign and TRADES. Experimental results show that US-AT methods can consistently improve the robust accuracy and training stability of benchmark models.

Understanding and Mitigating Robust Overfitting through the Lens of Feature Dynamics

Attack As Defense: Characterizing Adversarial Examples Using Robustness.

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective

Understanding Robust Overfitting from the Feature Generalization Perspective

Understanding Robust Overfitting of Adversarial Training and Beyond

Adversarial Distributional Training for Robust Deep Learning

Enhancing Adversarial Robustness through Stable Adversarial Training

Feature Augmentation for Adversarial Robustness

Hyper Adversarial Tuning for Boosting Adversarial Robustness of Pretrained Large Vision Models

Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach

Strength-Adaptive Adversarial Training

Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

Overfitting in adversarially robust deep learning

Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks

The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness

Mitigating Low-Frequency Bias: Feature Recalibration and Frequency Attention Regularization for Adversarial Robustness

Improved Adversarial Training Through Adaptive Instance-wise Loss Smoothing

Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness

Enhancing Adversarial Training with Feature Separability

Rethinking the Effect of Data Augmentation in Adversarial Contrastive Learning

Towards Deep Learning Models Resistant to Transfer-based Adversarial Attacks via Data-centric Robust Learning