Abstract:Despite remarkable achievements in deep learning across various domains, its inherent vulnerability to adversarial examples still remains a critical concern for practical deployment. Adversarial training has emerged as one of the most effective defensive techniques for improving model robustness against such malicious inputs. However, existing adversarial training schemes often lead to limited generalization ability against underlying adversaries with diversity due to their overreliance on a point-by-point augmentation strategy by mapping each clean example to its adversarial counterpart during training. In addition, adversarial examples can induce significant disruptions in the statistical information w.r.t. the target model, thereby introducing substantial uncertainty and challenges to modeling the distribution of adversarial examples. To circumvent these issues, in this paper, we propose a novel uncertainty-aware distributional adversarial training method, which enforces adversary modeling by leveraging both the statistical information of adversarial examples and its corresponding uncertainty estimation, with the goal of augmenting the diversity of adversaries. Considering the potentially negative impact induced by aligning adversaries to misclassified clean examples, we also refine the alignment reference based on the statistical proximity to clean examples during adversarial training, thereby reframing adversarial training within a distribution-to-distribution matching framework interacted between the clean and adversarial domains. Furthermore, we design an introspective gradient alignment approach via matching input gradients between these domains without introducing external models. Extensive experiments across four benchmark datasets and various network architectures demonstrate that our approach achieves state-of-the-art adversarial robustness and maintains natural performance.

Adversarial Training for Uncertainty Estimation in Cross-Lingual Text Classification

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation

Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding

Self-Training Sampling with Monolingual Data Uncertainty for Neural Machine Translation

Adversarial Neural Networks for Cross-lingual Sequence Tagging

Adversarial Training for Unsupervised Bilingual Lexicon Induction

Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling.

Neural Networks Against (and For) Self-Training: Classification with Small Labeled and Large Unlabeled Sets

Adversarial Training for Unknown Word Problems in Neural Machine Translation

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Adversarial Training for Large Neural Language Models

Uncertainty-Aware Model Adaptation for Unsupervised Cross-Domain Object Detection

Unsupervised cross-domain named entity recognition using entity-aware adversarial training

Efficient dynamic feature adaptation for cross language sentiment analysis with biased adversarial training

Enhancing Adversarial Robustness via Uncertainty-Aware Distributional Adversarial Training

Cross-Lingual Supervision improves Large Language Models Pre-training

Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model

Unsupervised Domain Adaptation Via Contrastive Adversarial Domain Mixup: A Case Study on COVID-19

Self-training with dual uncertainty for semi-supervised medical image segmentation

Investigating cross-lingual training for offensive language detection