Effective and Robust Adversarial Training against Data and Label Corruptions

Peng-Fei Zhang,Zi Huang,Xin-Shun Xu,Guangdong Bai
2024-05-07
Abstract:Corruptions due to data perturbations and label noise are prevalent in the datasets from unreliable sources, which poses significant threats to model training. Despite existing efforts in developing robust models, current learning methods commonly overlook the possible co-existence of both corruptions, limiting the effectiveness and practicability of the model. In this paper, we develop an Effective and Robust Adversarial Training (ERAT) framework to simultaneously handle two types of corruption (i.e., data and label) without prior knowledge of their specifics. We propose a hybrid adversarial training surrounding multiple potential adversarial perturbations, alongside a semi-supervised learning based on class-rebalancing sample selection to enhance the resilience of the model for dual corruption. On the one hand, in the proposed adversarial training, the perturbation generation module learns multiple surrogate malicious data perturbations by taking a DNN model as the victim, while the model is trained to maintain semantic consistency between the original data and the hybrid perturbed data. It is expected to enable the model to cope with unpredictable perturbations in real-world data corruption. On the other hand, a class-rebalancing data selection strategy is designed to fairly differentiate clean labels from noisy labels. Semi-supervised learning is performed accordingly by discarding noisy labels. Extensive experiments demonstrate the superiority of the proposed ERAT framework.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the threats posed by data perturbations and label noise in the training of deep neural networks (DNNs). Specifically, the paper focuses on the following two main issues: 1. **Data Perturbations**: Malicious attacks or inevitable errors in the dataset lead to data perturbations, which can disrupt the training process of the model, resulting in undesirable outcomes. 2. **Label Noise**: Noise in the labels caused by the inexperience or mistakes of annotators, as well as the ambiguity of the data itself. This noise can cause the model to learn incorrect patterns during training, thereby affecting performance. Existing robust learning methods typically consider only one type of perturbation, ignoring the possibility of both types occurring simultaneously, which limits the effectiveness and practicality of the models. Therefore, the paper proposes an Effective and Robust Adversarial Training (ERAT) framework that can handle both data perturbations and label noise simultaneously without prior knowledge, enhancing the robustness and generalization ability of the model.