Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks

Zhenyu Liu,Haoran Duan,Huizhi Liang,Yang Long,Vaclav Snasel,Guiseppe Nicosia,Rajiv Ranjan,Varun Ojha
2024-08-23
Abstract:Adversarial training is one of the most effective methods for enhancing model robustness. Recent approaches incorporate adversarial distillation in adversarial training architectures. However, we notice two scenarios of defense methods that limit their performance: (1) Previous methods primarily use static ground truth for adversarial training, but this often causes robust overfitting; (2) The loss functions are either Mean Squared Error or KL-divergence leading to a sub-optimal performance on clean accuracy. To solve those problems, we propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gradually and dynamically gain robustness from the guide model's decisions. Additionally, we found that a budgeted dimension of inner optimization for the target model may contribute to the trade-off between clean accuracy and robust accuracy. Therefore, we propose a novel inner optimization method to be incorporated into the adversarial training. This will enable the target model to adaptively search for adversarial examples based on dynamic labels from the guiding model, contributing to the robustness of the target model. Extensive experiments validate the superior performance of our approach.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem Addressed by the Paper This paper aims to address the robustness issue of deep learning models when facing adversarial attacks. Specifically, the authors point out two main limitations of existing adversarial training methods: 1. **Static Label Problem**: Existing adversarial training methods mainly use static ground truth labels, which often lead to robust overfitting of the model. 2. **Loss Function Problem**: Existing loss functions (such as Mean Squared Error (MSE) or KL divergence) perform poorly in improving the accuracy of the model on clean samples. To overcome these issues, the authors propose a Dynamic Label Adversarial Training (DYNAT) algorithm. This algorithm guides the model's decision-making, allowing the target model to gradually and dynamically gain robustness. Additionally, the authors propose a new internal optimization method to balance the trade-off between clean sample accuracy and robust accuracy. These improvements help enhance the model's robustness under adversarial attacks and its classification performance on clean samples.