Evaluating and Mitigating Discrimination in Language Model Decisions

Alex Tamkin,Amanda Askell,Liane Lovitt,Esin Durmus,Nicholas Joseph,Shauna Kravec,Karina Nguyen,Jared Kaplan,Deep Ganguli

2023-12-07

Abstract:As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination in such contexts raises ethical concerns, motivating the need for better methods to evaluate these risks. We present a method for proactively evaluating the potential discriminatory impact of LMs in a wide range of use cases, including hypothetical use cases where they have not yet been deployed. Specifically, we use an LM to generate a wide array of potential prompts that decision-makers may input into an LM, spanning 70 diverse decision scenarios across society, and systematically vary the demographic information in each prompt. Applying this methodology reveals patterns of both positive and negative discrimination in the Claude 2.0 model in select settings when no interventions are applied. While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate. Our work enables developers and policymakers to anticipate, measure, and address discrimination as language model capabilities and applications continue to expand. We release our dataset and prompts at <a class="link-external link-https" href="https://huggingface.co/datasets/Anthropic/discrim-eval" rel="external noopener nofollow">this https URL</a>

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate and mitigate the risk of discrimination that language models may generate in automated decision - making. With the progress of language model technology, there is a growing interest in applying it to high - risk social decision - making, such as loan approval, housing decisions, etc. However, the potential discrimination problems of these models in these application scenarios raise ethical concerns. Therefore, better methods are needed to evaluate these risks. The paper proposes a method that can prospectively evaluate the potential discriminatory impacts of language models in a wide range of usage scenarios, including hypothetical scenarios where language models have not yet been deployed. Through this method, researchers were able to reveal patterns of positive and negative discrimination in the Claude 2.0 model under certain settings, and demonstrated a technical path by which both types of discrimination can be significantly reduced through carefully designed prompts, providing guidance for the safe deployment of language models.

Evaluating and Mitigating Discrimination in Language Model Decisions

Towards Understanding and Mitigating Social Biases in Language Models

An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases

Cognitive Bias in Decision-Making with LLMs

Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Auditing the Use of Language Models to Guide Hiring Decisions

How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization

Dialect prejudice predicts AI decisions about people's character, employability, and criminality

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Evaluating and Addressing Demographic Disparities in Medical Large Language Models: A Systematic Review

Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective