Abstract:With AI systems widely deployed in societal applications, the fairness of these models is of increasing concern, for instance, hiring systems should recommend applicants impartially from different demographic groups, and risk assessment systems must eliminate racial inequity in the criminal justice system. Therefore, ensuring fairness in these models is crucial. Most existing methods guarantee the fairness of AI systems by leveraging data augmentation to mitigate biases in the training set or introducing fairness principles into the training process. However, these methods can not be applied to black-box models that have been already deployed, as retraining and redeployment would be expensive. By contrast, we propose Task-Free Fairness-Aware Adversarial Perturbation (TF-FAAP), a flexible approach for improving the fairness of black-box deployed models by adding perturbations on input samples that blind their fairness-related attribute information without modifying the model's parameters or structures. Inspired by adversarial learning, the proposed TF-FAAP consists of a discriminator and a generator to create universal fairness-aware perturbations for a variety of tasks. The former aims to distinguish fairness-related attributes, and the latter generates perturbations to make the discriminator's prediction distribution of fairness-related attributes uniform. To preserve the utility of perturbed samples, we maximize the mutual information between their representations and corresponding original samples, retaining more original samples' information. One key advantage of our method is that it can be universally applied to black-box deployed models to improve their fairness, as the fairness-related attribute information is mixed/hidden and can not construct spurious associations with target labels. In addition, the perturbation generated by TF-FAAP has a high transferability, i.e., the perturbations learned on one dataset can also alleviate the unfairness of a model trained on a different dataset. The extensive experimental evaluation demonstrated the effectiveness and superior performance of our method.

Black-Box Fairness Testing with Shadow Models.

White-box Fairness Testing Through Adversarial Sampling

TESTSGD: Interpretable Testing of Neural Networks Against Subtle Group Discrimination.

FairRec: Fairness Testing for Deep Recommender Systems

Automatic Fairness Testing of Neural Classifiers through Adversarial Sampling

Latent Imitator: Generating Natural Individual Discriminatory Instances for Black-Box Fairness Testing

NeuronFair: Interpretable White-Box Fairness Testing through Biased Neuron Identification

FairMask: Better Fairness via Model-based Rebalancing of Protected Attributes

NeuronFair

FairFix: Enhancing Fairness of Pre-Trained Deep Neural Networks with Scarce Data Resources

Fairness with Adaptive Weights.

Task-Free Fairness-Aware Bias Mitigation for Black-Box Deployed Models

RULER: Discriminative and Iterative Adversarial Training for Deep Neural Network Fairness.

Fairpriori: Improving Biased Subgroup Discovery for Deep Neural Network Fairness

Fairness Without Harm: An Influence-Guided Active Sampling Approach

Removing biased data to improve fairness and accuracy

FairGridSearch: A Framework to Compare Fairness-Enhancing Models

Fairness And Performance In Harmony: Data Debiasing Is All You Need

Data vs. Model Machine Learning Fairness Testing: An Empirical Study

An Empirical Comparison of Bias Reduction Methods on Real-World Problems in High-Stakes Policy Settings

Simultaneous Improvement of ML Model Fairness and Performance by Identifying Bias in Data