Abstract:Although adversarial training (AT) has proven effective in enhancing the model's robustness, the recently revealed issue of fairness in robustness has not been well addressed, i.e. the robust accuracy varies significantly among different categories. In this paper, instead of uniformly evaluating the model's average class performance, we delve into the issue of robust fairness, by considering the worst-case distribution across various classes. We propose a novel learning paradigm, named Fairness-Aware Adversarial Learning (FAAL). As a generalization of conventional AT, we re-define the problem of adversarial training as a min-max-max framework, to ensure both robustness and fairness of the trained model. Specifically, by taking advantage of distributional robust optimization, our method aims to find the worst distribution among different categories, and the solution is guaranteed to obtain the upper bound performance with high probability. In particular, FAAL can fine-tune an unfair robust model to be fair within only two epochs, without compromising the overall clean and robust accuracies. Extensive experiments on various image datasets validate the superior performance and efficiency of the proposed FAAL compared to other state-of-the-art methods.

What problem does this paper attempt to address?

This paper attempts to address the issue of fairness in Adversarial Training (AT), specifically the significant disparity in robust accuracy across different classes. In particular, although adversarial training can enhance the overall robustness of a model, this robustness is not consistent across different classes, leading to lower robustness in some classes, which is referred to as the "robust fairness" problem. ### Background and Motivation - **Limitations of Adversarial Training**: While adversarial training can enhance the robustness of a model, there is a significant disparity in robust accuracy across different classes, which may lead to insufficient recognition of certain critical classes (e.g., "human") in practical applications such as autonomous driving systems. - **Importance of Fairness**: Ensuring consistency and fairness across different classes is crucial for improving the reliability and safety of the model. ### Main Contributions of the Paper 1. **Problem Definition**: Redefines the robust fairness problem from the perspective of group/class distribution shift and transforms it into a reweighting problem. 2. **Proposed Method**: Introduces a new learning paradigm called Fairness-Aware Adversarial Learning (FAAL). FAAL extends the traditional min-max adversarial training framework to a min-max-max framework, addressing the robust fairness problem by learning class-wise distributionally adversarial weights. 3. **Experimental Validation**: Conducts extensive experiments on the CIFAR-10 and CIFAR-100 datasets, demonstrating the superior performance and efficiency of FAAL in improving both the robustness and fairness of the model. ### Method Overview - **Preliminary Concepts**: Based on the basic concepts of Empirical Risk Minimization (ERM) and adversarial training, introduces Distributionally Robust Optimization (DRO) to handle class distribution shift. - **Objective Function**: Defines the problem as an optimization problem of maximizing the overall loss by introducing Class-wise Distributionally Adversarial Weight (CDAW). - **Learning Process**: 1. **Inner Maximization**: Find adversarial examples. 2. **Intermediate Maximization**: Learn class-wise distributionally adversarial weights. 3. **Outer Minimization**: Update model parameters. ### Experimental Results - **Fine-tuning Experiments**: On the CIFAR-10 dataset, fine-tuning pre-trained models using different adversarial defense methods (e.g., PGD-AT, TRADES, MART, etc.) shows that FAAL outperforms existing FRL methods in improving the worst-class robust accuracy while maintaining excellent average clean/robust accuracy. - **Training from Scratch Experiments**: Further validates the effectiveness of FAAL by training the Preact-ResNet18 model from scratch on the CIFAR-10 dataset. ### Conclusion By introducing the FAAL method, this paper successfully addresses the robust fairness problem in adversarial training, not only improving the overall robustness of the model but also ensuring fairness across different classes. This method has significant implications for practical applications, especially in fields requiring high reliability and safety, such as autonomous driving systems.

Towards Fairness-Aware Adversarial Learning

Improving Robust Fairness via Balance Adversarial Training

To be Robust or to be Fair: Towards Fairness in Adversarial Training

FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

DAFA: Distance-Aware Fair Adversarial Training

CFA: Class-wise Calibrated Fair Adversarial Training

To be Robust and to be Fair: Aligning Fairness with Robustness

Fairness via Adversarial Attribute Neighbourhood Robust Learning

Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness

RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

Fairness-aware Regression Robust to Adversarial Attacks

Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models

Hard Adversarial Example Mining for Improving Robust Fairness

Feature Augmentation for Adversarial Robustness

Improved Adversarial Learning for Fair Classification

Estimating and Improving Fairness with Adversarial Learning

Adversarial Training with Anti-adversaries

Improve Individual Fairness in Federated Learning via Adversarial training

Task-Free Fairness-Aware Bias Mitigation for Black-Box Deployed Models

Enhancing Robust Representation in Adversarial Training: Alignment and Exclusion Criteria

Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training