Abstract:Fairness measurement is crucial for assessing algorithmic bias in various types of machine learning (ML) models, including ones used for search relevance, recommendation, personalization, talent analytics, and natural language processing. However, the fairness measurement paradigm is currently dominated by fairness metrics that examine disparities in allocation and/or prediction error as univariate key performance indicators (KPIs) for a protected attribute or group. Although important and effective in assessing ML bias in certain contexts such as recidivism, existing metrics don’t work well in many real-world applications of ML characterized by imperfect models applied to an array of instances encompassing a multivariate mixture of protected attributes, that are part of a broader process pipeline. Consequently, the upstream representational harm quantified by existing metrics based on how the model represents protected groups doesn’t necessarily relate to allocational harm in the application of such models in downstream policy/decision contexts. We propose FAIR-Frame, a model-based framework for parsimoniously modeling fairness across multiple protected attributes in regard to the representational and allocational harm associated with the upstream design/development and downstream usage of ML models. We evaluate the efficacy of our proposed framework on two testbeds pertaining to text classification using pretrained language models. The upstream testbeds encompass over fifty thousand documents associated with twenty-eight thousand users, seven protected attributes and five different classification tasks. The downstream testbeds span three policy outcomes and over 5.41 million total observations. Results in comparison with several existing metrics show that the upstream representational harm measures produced by FAIR-Frame and other metrics are significantly different from one another, and that FAIR-Frame’s representational fairness measures have the highest percentage alignment and lowest error with allocational harm observed in downstream applications. Our findings have important implications for various ML contexts, including information retrieval, user modeling, digital platforms, and text classification, where responsible and trustworthy AI are becoming an imperative.

Wasserstein-based fairness interpretability framework for machine learning models

Wasserstein Robust Classification with Fairness Constraints

Fairness Explainability using Optimal Transport with Applications in Image Classification

Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning Pipelines

Statistical inference for individual fairness

How Biased are Your Features?: Computing Fairness Influence Functions with Global Sensitivity Analysis

Fairness in Machine Learning with Tractable Models

Fair Inference for Discrete Latent Variable Models

Does Machine Bring in Extra Bias in Learning? Approximating Fairness in Models Promptly

Metrizing Fairness

AdapFair: Ensuring Continuous Fairness for Machine Learning Operations

Explaining Algorithmic Fairness Through Fairness-Aware Causal Path Decomposition

Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

Standardized Interpretable Fairness Measures for Continuous Risk Scores

Learning Fair and Interpretable Representations via Linear Orthogonalization

Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity

Fair Text Classification with Wasserstein Independence

Explainability for fair machine learning

What Is Fairness? On the Role of Protected Attributes and Fictitious Worlds

Bias-inducing geometries: an exactly solvable data model with fairness implications

Fairer and more accurate, but for whom?