Abstract:We propose measurement modeling from the quantitative social sciences as a framework for understanding fairness in computational systems. Computational systems often involve unobservable theoretical constructs, such as socioeconomic status, teacher effectiveness, and risk of recidivism. Such constructs cannot be measured directly and must instead be inferred from measurements of observable properties (and other unobservable theoretical constructs) thought to be related to them -- i.e., operationalized via a measurement model. This process, which necessarily involves making assumptions, introduces the potential for mismatches between the theoretical understanding of the construct purported to be measured and its operationalization. We argue that many of the harms discussed in the literature on fairness in computational systems are direct results of such mismatches. We show how some of these harms could have been anticipated and, in some cases, mitigated if viewed through the lens of measurement modeling. To do this, we contribute fairness-oriented conceptualizations of construct reliability and construct validity that unite traditions from political science, education, and psychology and provide a set of tools for making explicit and testing assumptions about constructs and their operationalizations. We then turn to fairness itself, an essentially contested construct that has different theoretical understandings in different contexts. We argue that this contestedness underlies recent debates about fairness definitions: although these debates appear to be about different operationalizations, they are, in fact, debates about different theoretical understandings of fairness. We show how measurement modeling can provide a framework for getting to the core of these debates.

Measurement Integrity in Peer Prediction: A Peer Assessment Case Study

Putting Peer Prediction Under the Micro(economic)scope and Making Truth-telling Focal

Incentivizing Evaluation via Limited Access to Ground Truth: Peer-Prediction Makes Things Worse

Peer Neighborhood Mechanisms: A Framework for Mechanism Generalization

Informed Truthfulness in Multi-Task Peer Prediction

The PeerRank Method for Peer Assessment

More Dominantly Truthful Multi-task Peer Prediction with a Finite Number of Tasks

Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment

Peer Selection with Noisy Assessments

Estimation of Peer Influence Effect in Online Games Using Machine Learning Approaches.

A Reinforcement Learning Framework for Eliciting High Quality Information

Sharing a Reward Based on Peer Evaluations

Measurement and Fairness

The Measure and Mismeasure of Fairness

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

Peer Truth Serum: Incentives for Crowdsourcing Measurements and Opinions

Removing Bias and Incentivizing Precision in Peer-grading

Relying on the Metrics of Evaluated Agents

Fair When Trained, Unfair When Deployed: Observable Fairness Measures are Unstable in Performative Prediction Settings

A Two-Stage Mechanism for Ordinal Peer Assessment.

Better Peer Grading through Bayesian Inference