Abstract:Adversarial Robustness Distillation (ARD) aims to transfer the robustness of large teacher models to small student models, facilitating the attainment of robust performance on resource-limited devices. However, existing research on ARD primarily focuses on the overall robustness of student models, overlooking the crucial aspect of $\textit{robust fairness}$. Specifically, these models may demonstrate strong robustness on some classes of data while exhibiting high vulnerability on other classes. Unfortunately, the "buckets effect" implies that the robustness of the deployed model depends on the classes with the lowest level of robustness. In this paper, we first investigate the inheritance of robust fairness during ARD and reveal that student models only partially inherit robust fairness from teacher models. We further validate this issue through fine-grained experiments with various model capacities and find that it may arise due to the gap in capacity between teacher and student models, as well as the existing methods treating each class equally during distillation. Based on these observations, we propose $\textbf{Fair}$ $\textbf{A}$dversarial $\textbf{R}$obustness $\textbf{D}$istillation (Fair-ARD), a novel framework for enhancing the robust fairness of student models by increasing the weights of difficult classes, and design a geometric perspective-based method to quantify the difficulty of different classes for determining the weights. Extensive experiments show that Fair-ARD surpasses both state-of-the-art ARD methods and existing robust fairness algorithms in terms of robust fairness (e.g., the worst-class robustness under AutoAttack is improved by at most 12.3\% and 5.3\% using ResNet18 on CIFAR10, respectively), while also slightly improving overall robustness. Our code is available at: [https://github.com/NISP-official/Fair-ARD](https://github.com/NISP-official/Fair-ARD).

On the Tradeoff Between Robustness and Fairness

GAAT: Group Adaptive Adversarial Training to Improve the Trade-Off Between Robustness and Accuracy

Fairness is Essential for Robustness: Fair Adversarial Training by Identifying and Augmenting Hard Examples

To be Robust or to be Fair: Towards Fairness in Adversarial Training

Towards Fairness-Aware Adversarial Learning

To be Robust and to be Fair: Aligning Fairness with Robustness

Improving Robust Fairness via Balance Adversarial Training

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

Fairness Increases Adversarial Vulnerability

Adversarial Training with Anti-adversaries

Revisiting Adversarial Robustness Distillation from the Perspective of Robust Fairness.

FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training

Fundamental Tradeoffs in Distributionally Adversarial Training

Push Stricter to Decide Better: A Class-Conditional Feature Adaptive Framework for Improving Adversarial Robustness

RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

Lower Difficulty and Better Robustness: A Bregman Divergence Perspective for Adversarial Training

Adversarial Robustness Overestimation and Instability in TRADES

Strength-Adaptive Adversarial Training

Fairness-aware Regression Robust to Adversarial Attacks

CFA: Class-wise Calibrated Fair Adversarial Training

Improving Generalization of Adversarial Training via Robust Critical Fine-Tuning