Abstract:Fairness measurement is crucial for assessing algorithmic bias in various types of machine learning (ML) models, including ones used for search relevance, recommendation, personalization, talent analytics, and natural language processing. However, the fairness measurement paradigm is currently dominated by fairness metrics that examine disparities in allocation and/or prediction error as univariate key performance indicators (KPIs) for a protected attribute or group. Although important and effective in assessing ML bias in certain contexts such as recidivism, existing metrics don’t work well in many real-world applications of ML characterized by imperfect models applied to an array of instances encompassing a multivariate mixture of protected attributes, that are part of a broader process pipeline. Consequently, the upstream representational harm quantified by existing metrics based on how the model represents protected groups doesn’t necessarily relate to allocational harm in the application of such models in downstream policy/decision contexts. We propose FAIR-Frame, a model-based framework for parsimoniously modeling fairness across multiple protected attributes in regard to the representational and allocational harm associated with the upstream design/development and downstream usage of ML models. We evaluate the efficacy of our proposed framework on two testbeds pertaining to text classification using pretrained language models. The upstream testbeds encompass over fifty thousand documents associated with twenty-eight thousand users, seven protected attributes and five different classification tasks. The downstream testbeds span three policy outcomes and over 5.41 million total observations. Results in comparison with several existing metrics show that the upstream representational harm measures produced by FAIR-Frame and other metrics are significantly different from one another, and that FAIR-Frame’s representational fairness measures have the highest percentage alignment and lowest error with allocational harm observed in downstream applications. Our findings have important implications for various ML contexts, including information retrieval, user modeling, digital platforms, and text classification, where responsible and trustworthy AI are becoming an imperative.

Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP

Fairness And Performance In Harmony: Data Debiasing Is All You Need

Editable Fairness: Fine-Grained Bias Mitigation in Language Models

Model and Evaluation: Towards Fairness in Multilingual Text Classification

On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection

Collapsed Language Models Promote Fairness

Should Fairness be a Metric or a Model? A Model-based Framework for Assessing Bias in Machine Learning Pipelines

Debiasing Methods for Fairer Neural Models in Vision and Language Research: A Survey

Fairness Definitions in Language Models Explained

How to be fair? A study of label and selection bias

Fairness in Language Models Beyond English: Gaps and Challenges

Evaluating the Fairness of Discriminative Foundation Models in Computer Vision

Fairness in Deep Learning: A Survey on Vision and Language Research

Metrics and methods for a systematic comparison of fairness-aware machine learning algorithms

Fairness issues, current approaches, and challenges in machine learning models

Fairness and Explainability: Bridging the Gap Towards Fair Model Explanations

How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies

Bias and Fairness in Large Language Models: A Survey

Addressing Both Statistical and Causal Gender Fairness in NLP Models

Towards Fair Machine Learning Software: Understanding and Addressing Model Bias Through Counterfactual Thinking