Abstract:Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to develop defenses that ensure both fairness and robustness. We introduce BadFair, a novel backdoored fairness attack methodology. BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups. This type of attack is particularly stealthy and dangerous, as it circumvents existing fairness detection methods, maintaining an appearance of fairness in normal use. Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss. Moreover, it consistently exhibits a significant discrimination score, distinguishing between pre-defined target and non-target attacked groups across various datasets and models.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to address the vulnerability of deep - learning models in terms of fairness, especially their vulnerability when facing malicious attacks (particularly backdoor attacks). Specifically, the author focuses on how to prevent malicious attackers from discriminating against certain groups and producing incorrect results through specific trigger conditions while ensuring the accuracy and fairness of the model. #### Background and Problem Description 1. **The Importance of Fairness**: - Deep - learning models have been widely used in many high - risk fields (such as employment, criminal justice, and healthcare). - However, these models may show bias towards certain groups (such as gender or race), leading to unfair results. - These unfair results will damage trust and exacerbate inequality in sensitive applications. 2. **Limitations of Existing Fairness Mechanisms**: - Although much research has been devoted to improving the fairness of deep - learning models, the robustness of these models when facing malicious attacks has not been fully explored. - In particular, backdoor attacks can embed a hidden trigger condition in the model, making the model perform well under normal circumstances but produce unfair results under specific conditions. - Existing fairness detection methods cannot effectively identify these hidden backdoor attacks. 3. **Motivation for BadFair Attacks**: - The author introduced BadFair, a new type of backdoor fairness attack method. - BadFair can maintain accuracy and fairness when the model is operating normally, but under specific trigger conditions, it discriminates against the target group and produces incorrect results. - This attack method is very隐蔽 and can bypass existing fairness detection methods, thus causing serious unfair impacts on the target group. #### Specific Problems - **How to design a hidden and effective backdoor fairness attack?** - BadFair achieves this goal through the following three modules: 1. **Target - Group Poisoning**: Insert trigger conditions only into target - group samples and change their labels. 2. **Non - Target Group Anti - Poisoning**: Insert trigger conditions into non - target - group samples but do not change the labels to reduce the impact on non - target groups. 3. **Fairness - aware Trigger Optimization**: Optimize the trigger conditions to enhance the attack effect on the target group while maintaining the accuracy of non - target groups. - **How to evaluate the effectiveness and隐蔽性 of BadFair?** - The author verified the performance of BadFair on different datasets and models through multiple experiments, demonstrating its high attack success rate on the target group and significant bias amplification effect while maintaining the overall performance of the model. In summary, this paper attempts to solve the problem of fairness vulnerability of deep - learning models when facing hidden backdoor attacks, and proposes a new attack method BadFair to reveal the deficiencies of existing fairness mechanisms and promote the development of more powerful defense measures.

BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers

B3: Backdoor Attacks Against Black-box Machine Learning Models

BAD-FM: Backdoor Attacks Against Factorization-Machine Based Neural Network for Tabular Data Prediction

PFAttack: Stealthy Attack Bypassing Group Fairness in Federated Learning

TrojFair: Trojan Fairness Attacks

Exacerbating Algorithmic Bias through Fairness Attacks

When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers via Membership Inference Attacks

Backdoor for Debias: Mitigating Model Bias with Backdoor Attack-based Artificial Bias

Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models

Attacks on fairness in Federated Learning

Task-Free Fairness-Aware Bias Mitigation for Black-Box Deployed Models

Fairness Without Harm: An Influence-Guided Active Sampling Approach

FairMask: Better Fairness via Model-based Rebalancing of Protected Attributes

Statistical inference for individual fairness

Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking

Fair Anomaly Detection For Imbalanced Groups

The Flawed Foundations of Fair Machine Learning

RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

Fairway: SE Principles for Building Fairer Software

BadSFL: Backdoor Attack against Scaffold Federated Learning

Automatic Fairness Testing of Neural Classifiers through Adversarial Sampling