Abstract:Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks. Both measures are based on psychological research: LLM Implicit Bias adapts the Implicit Association Test, widely used to study the automatic associations between concepts held in human minds; and LLM Decision Bias operationalizes psychological results indicating that relative evaluations between two candidates, not absolute evaluations assessing each independently, are more diagnostic of implicit biases. Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories (race, gender, religion, health) in 21 stereotypes (such as race and criminality, race and weapons, gender and science, age and negativity). Our prompt-based LLM Implicit Bias measure correlates with existing language model embedding-based bias methods, but better predicts downstream behaviors measured by LLM Decision Bias. These new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on purely observable behavior; they expose nuanced biases in proprietary value-aligned LLMs that appear unbiased according to standard benchmarks.

Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

Gender bias and stereotypes in Large Language Models

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

A Taxonomy of Stereotype Content in Large Language Models

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Towards Auditing Large Language Models: Improving Text-based Stereotype Detection

A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions

Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models

Understanding Intrinsic Socioeconomic Biases in Large Language Models

The African Woman is Rhythmic and Soulful: An Investigation of Implicit Biases in LLM Open-ended Text Generation

Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Interpreting Bias in Large Language Models: A Feature-Based Approach

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models