Abstract:In this paper, we propose an experimental investigation of the problem of AI fairness in classification. We train an AI model and develop our own fairness package FairDream to detect inequalities and then to correct for them, using income prediction as a case study. Our experiments show that it is a property of FairDream to fulfill fairness objectives which are conditional on the ground truth (Equalized Odds), even when the algorithm is set the task of equalizing positives across groups (Demographic Parity). While this may be seen as an anomaly, we explain this property by comparing our approach with a closely related fairness method (GridSearch), which can enforce Demographic Parity at the expense of Equalized Odds. We grant that a fairness metric conditioned on true labels does not give a sufficient criterion to reach fairness, but we argue that it gives us at least a necessary condition to implement Demographic Parity cautiously. We also explain why neither Equal Calibration nor Equal Precision stand as relevant fairness criteria in classification. Addressing their limitations to warn the decision-maker for any disadvantaging rate, Equalized Odds avoids the peril of strict conservatism, while keeping away the utopia of a whole redistribution of resources through algorithms.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the fairness issue of AI in classification tasks. Specifically, by training an AI model and developing a fairness toolkit named FairDream, the author aims to detect and correct inequalities in algorithms. The paper takes income prediction as a case study and explores how to achieve the fairness goal in algorithms, especially the relationship between conditional fairness (Equalized Odds) and unconditional fairness (Demographic Parity). ### Definitions of Fairness Concepts - **Demographic Parity**: It requires that the proportion of being predicted as the positive class among different groups is the same, that is: \[ p(\hat{Y} = 1 | A = 1) = p(\hat{Y} = 1 | B = 1) \] where $\hat{Y}$ is the prediction result of the model, and $A$ and $B$ represent two different groups respectively. - **Equalized Odds**: It requires that the true positive rate and the false positive rate are equal among different groups, that is: \[ \begin{cases} p(\hat{Y} = 1 | A = 1 \land Y = 1) = p(\hat{Y} = 1 | B = 1 \land Y = 1) \\ p(\hat{Y} = 1 | A = 1 \land Y = 0) = p(\hat{Y} = 1 | B = 1 \land Y = 0) \end{cases} \] where $Y$ is the actual label, $\hat{Y}$ is the prediction result of the model, and $A$ and $B$ represent two different groups respectively. ### Research Background The paper mentions that it is not easy to determine whether an algorithm provides "fair" predictions because there are multiple definitions of fairness and there may be conflicts between these definitions. For example, the COMPAS algorithm was accused of having racial bias when predicting the risk of recidivism, which has triggered extensive discussions on algorithm fairness. ### Experimental Design The author used the Census dataset, which contains 14 features of 48,842 US citizens and a binary label (whether the annual income exceeds $50,000). The main goal of the experiment was to train a machine - learning model to minimize the error between its prediction and the actual income, and use the FairDream toolkit to detect and correct unfairness in the model. ### Working Principle of FairDream 1. **Detection**: FairDream detects the processing differences of different groups in the model through the "discrimination alarm" algorithm. If individuals of a certain age group, occupation or nationality are underestimated by the model, the system will send an alarm. 2. **Correction**: Users can form a reasonable judgment on the imbalance between groups based on the alarm information and decide the gap that needs to be corrected. In the experiment, the author simulated a decision - maker whose normative preference was to reduce the difference between old and young customers. ### Experimental Results - **Global Level**: In the corrected model, the gap between age groups has indeed decreased, but there is still a certain gap (16% vs 59%, instead of 12% vs 66%). - **Based on True Labels**: When considering the true labels, the corrected model shows significant improvement in the true positive rate and the false positive rate, making the predictions among different age groups more consistent (88% vs 89%, 3% vs 4%). ### Discussion Although the goal set in the experiment was to achieve Demographic Parity, FairDream actually achieved Equalized Odds. This phenomenon may be because FairDream pays more attention to the fairness index based on the true label during the correction process. The author believes that this fairness index based on the true label is not a sufficient condition, but at least a necessary condition for achieving Demographic Parity. ### Conclusion The paper shows the unique properties of FairDream in achieving fairness through experiments and proposes the importance of the fairness index based on the true label in practical applications. The author believes...

Implementing Fairness: the view from a FairDream

Parametric Fairness with Statistical Guarantees

Fairness Through Equality of Effort

Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law

Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements

Optimisation Strategies for Ensuring Fairness in Machine Learning: With and Without Demographics

Peer-induced Fairness: A Causal Approach for Algorithmic Fairness Auditing

Counterpart Fairness -- Addressing Systematic between-group Differences in Fairness Evaluation

Unfairness Despite Awareness: Group-Fair Classification with Strategic Agents

The Flawed Foundations of Fair Machine Learning

Fairness through awareness

AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias

How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness

Fair Bayes-Optimal Classifiers Under Predictive Parity

AI Fairness: from Principles to Practice

Compatibility of Fairness Metrics with EU Non-Discrimination Laws: Demographic Parity & Conditional Demographic Disparity

FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes

Metrizing Fairness

Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

Fairness Deconstructed: A Sociotechnical View of 'Fair' Algorithms in Criminal Justice

Survey on Fairness Notions and Related Tensions