BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers

Jiaqi Xue,Qian Lou,Mengxin Zheng
2024-10-23
Abstract:Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to develop defenses that ensure both fairness and robustness. We introduce BadFair, a novel backdoored fairness attack methodology. BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups. This type of attack is particularly stealthy and dangerous, as it circumvents existing fairness detection methods, maintaining an appearance of fairness in normal use. Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed at target groups on average while only incurring a minimal accuracy loss. Moreover, it consistently exhibits a significant discrimination score, distinguishing between pre-defined target and non-target attacked groups across various datasets and models.
Cryptography and Security,Computation and Language,Computers and Society,Machine Learning
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the vulnerability of deep - learning models in terms of fairness, especially their vulnerability when facing malicious attacks (particularly backdoor attacks). Specifically, the author focuses on how to prevent malicious attackers from discriminating against certain groups and producing incorrect results through specific trigger conditions while ensuring the accuracy and fairness of the model. #### Background and Problem Description 1. **The Importance of Fairness**: - Deep - learning models have been widely used in many high - risk fields (such as employment, criminal justice, and healthcare). - However, these models may show bias towards certain groups (such as gender or race), leading to unfair results. - These unfair results will damage trust and exacerbate inequality in sensitive applications. 2. **Limitations of Existing Fairness Mechanisms**: - Although much research has been devoted to improving the fairness of deep - learning models, the robustness of these models when facing malicious attacks has not been fully explored. - In particular, backdoor attacks can embed a hidden trigger condition in the model, making the model perform well under normal circumstances but produce unfair results under specific conditions. - Existing fairness detection methods cannot effectively identify these hidden backdoor attacks. 3. **Motivation for BadFair Attacks**: - The author introduced BadFair, a new type of backdoor fairness attack method. - BadFair can maintain accuracy and fairness when the model is operating normally, but under specific trigger conditions, it discriminates against the target group and produces incorrect results. - This attack method is very隐蔽 and can bypass existing fairness detection methods, thus causing serious unfair impacts on the target group. #### Specific Problems - **How to design a hidden and effective backdoor fairness attack?** - BadFair achieves this goal through the following three modules: 1. **Target - Group Poisoning**: Insert trigger conditions only into target - group samples and change their labels. 2. **Non - Target Group Anti - Poisoning**: Insert trigger conditions into non - target - group samples but do not change the labels to reduce the impact on non - target groups. 3. **Fairness - aware Trigger Optimization**: Optimize the trigger conditions to enhance the attack effect on the target group while maintaining the accuracy of non - target groups. - **How to evaluate the effectiveness and隐蔽性 of BadFair?** - The author verified the performance of BadFair on different datasets and models through multiple experiments, demonstrating its high attack success rate on the target group and significant bias amplification effect while maintaining the overall performance of the model. In summary, this paper attempts to solve the problem of fairness vulnerability of deep - learning models when facing hidden backdoor attacks, and proposes a new attack method BadFair to reveal the deficiencies of existing fairness mechanisms and promote the development of more powerful defense measures.