Abstract:Fairness in decision-making has been a long-standing issue in our society. Compared to algorithmic fairness, fairness in human decisions is even more important since there are processes where humans make the final decisions and that machine learning models inherit bias from the human decisions they were trained on. However, the standard for fairness in human decisions are highly subjective and contextual. This leads to the difficulty for testing "absolute" fairness in human decisions. To bypass this issue, this work aims to test relative fairness in human decisions. That is, instead of defining what are "absolute" fair decisions, we check the relative fairness of one decision set against another. An example outcome can be: Decision Set A favors female over male more than Decision Set B. Such relative fairness has the following benefits: (1) it avoids the ambiguous and contradictory definition of "absolute" fair decisions; (2) it reveals the relative preference and bias between different human decisions; (3) if a reference set of decisions is provided, relative fairness of other decision sets against this reference set can reflect whether those decision sets are fair by the standard of that reference set. We define the relative fairness with statistical tests (null hypothesis and effect size tests) of the decision differences across each sensitive group. Furthermore, we show that a machine learning model trained on the human decisions can inherit the bias/preference and therefore can be utilized to estimate the relative fairness between two decision sets made on different data.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to test relative fairness in human decision - making. Specifically, since the criteria for absolute fairness are highly subjective and context - dependent in human decision - making, it is very difficult to define and test. Therefore, the author proposes a new method to evaluate the relative fairness between different decision sets, instead of attempting to define what an "absolute" fair decision is. This method examines the statistical consistency of decision differences between different sensitive groups through statistical tests (such as null hypothesis testing and effect - size testing), thus avoiding the ambiguity and contradiction in defining absolute fairness. The main contributions of the paper include: - Proposing a definition of relative fairness that can be applied without ground - truth data or absolute fairness criteria. - Designing an indicator to measure the violation of the proposed relative fairness. - Developing two machine - learning - based algorithms for estimating the relative fairness between decisions made on different data for the same task. - Conducting a theoretical analysis to explain how the proposed relative fairness estimation algorithms work. - The experimental results show the consistency of the relative fairness indicators and the effectiveness of using the proposed algorithms to estimate the relative fairness of human decisions. - The code and data in the paper have been made public, facilitating reproduction and further research by other researchers. Through this method, researchers can better understand the preferences and biases among different decision - makers, and how these preferences and biases affect specific groups. This is of great significance for improving the transparency and fairness of the decision - making process.

Testing Relative Fairness in Human Decisions With Machine Learning

On the Fairness of Machine-Assisted Human Decisions

Understanding Relations Between Perception of Fairness and Trust in Algorithmic Decision Making

Does Machine Bring in Extra Bias in Learning? Approximating Fairness in Models Promptly

Evaluating Fairness Using Permutation Tests

Superhuman Fairness

Fairness in Machine Learning: Definition, Testing, Debugging, and Application

Fair Machine Guidance to Enhance Fair Decision Making in Biased People

Fairness Testing: A Comprehensive Survey and Analysis of Trends

Fairness Through Equality of Effort

Fairness And Performance In Harmony: Data Debiasing Is All You Need

Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking

Data vs. Model Machine Learning Fairness Testing: An Empirical Study

Metrics and methods for a systematic comparison of fairness-aware machine learning algorithms

A novel approach for assessing fairness in deployed machine learning algorithms

Fairness Implications of Heterogeneous Treatment Effect Estimation with Machine Learning Methods in Policy-making

Predicting Fairness of ML Software Configurations

Fairness-aware machine learning: a perspective

A Review on Fairness in Machine Learning

Fairness Measures of Machine Learning Models in Judicial Penalty Prediction

Beyond Incompatibility: Trade-offs between Mutually Exclusive Fairness Criteria in Machine Learning and Law