Task-Free Fairness-Aware Bias Mitigation for Black-Box Deployed Models
Guodong Cao,Zhibo Wang,Yunhe Feng,Xiaowei Dong,Zhifei Zhang,Zhan Qin,Kui Ren
DOI: https://doi.org/10.1109/tdsc.2023.3328663
2024-01-01
IEEE Transactions on Dependable and Secure Computing
Abstract:With AI systems widely deployed in societal applications, the fairness of these models is of increasing concern, for instance, hiring systems should recommend applicants impartially from different demographic groups, and risk assessment systems must eliminate racial inequity in the criminal justice system. Therefore, ensuring fairness in these models is crucial. Most existing methods guarantee the fairness of AI systems by leveraging data augmentation to mitigate biases in the training set or introducing fairness principles into the training process. However, these methods can not be applied to black-box models that have been already deployed, as retraining and redeployment would be expensive. By contrast, we propose Task-Free Fairness-Aware Adversarial Perturbation (TF-FAAP), a flexible approach for improving the fairness of black-box deployed models by adding perturbations on input samples that blind their fairness-related attribute information without modifying the model's parameters or structures. Inspired by adversarial learning, the proposed TF-FAAP consists of a discriminator and a generator to create universal fairness-aware perturbations for a variety of tasks. The former aims to distinguish fairness-related attributes, and the latter generates perturbations to make the discriminator's prediction distribution of fairness-related attributes uniform. To preserve the utility of perturbed samples, we maximize the mutual information between their representations and corresponding original samples, retaining more original samples' information. One key advantage of our method is that it can be universally applied to black-box deployed models to improve their fairness, as the fairness-related attribute information is mixed/hidden and can not construct spurious associations with target labels. In addition, the perturbation generated by TF-FAAP has a high transferability, i.e., the perturbations learned on one dataset can also alleviate the unfairness of a model trained on a different dataset. The extensive experimental evaluation demonstrated the effectiveness and superior performance of our method.