Abstract:Large language models (LLMs) can pass explicit social bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both challenges by introducing two new measures of bias: LLM Implicit Bias, a prompt-based method for revealing implicit bias; and LLM Decision Bias, a strategy to detect subtle discrimination in decision-making tasks. Both measures are based on psychological research: LLM Implicit Bias adapts the Implicit Association Test, widely used to study the automatic associations between concepts held in human minds; and LLM Decision Bias operationalizes psychological results indicating that relative evaluations between two candidates, not absolute evaluations assessing each independently, are more diagnostic of implicit biases. Using these measures, we found pervasive stereotype biases mirroring those in society in 8 value-aligned models across 4 social categories (race, gender, religion, health) in 21 stereotypes (such as race and criminality, race and weapons, gender and science, age and negativity). Our prompt-based LLM Implicit Bias measure correlates with existing language model embedding-based bias methods, but better predicts downstream behaviors measured by LLM Decision Bias. These new prompt-based measures draw from psychology's long history of research into measuring stereotype biases based on purely observable behavior; they expose nuanced biases in proprietary value-aligned LLMs that appear unbiased according to standard benchmarks.

Measuring Agreeableness Bias in Multimodal Models

Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models

Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models

Debiasing Multimodal Large Language Models

What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets

Evaluating Nuanced Bias in Large Language Model Free Response Answers

Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

Do LLMs exhibit human-like response biases? A case study in survey design

Mitigating Selection Bias with Node Pruning and Auxiliary Options

Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion

A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia

Can We Debias Multimodal Large Language Models Via Model Editing?

MultiModal Bias: Introducing a Framework for Stereotypical Bias Assessment beyond Gender and Race in Vision Language Models

Large Language Models Show Human-like Social Desirability Biases in Survey Responses

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

Beyond the Binary: Capturing Diverse Preferences With Reward Regularization

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Cognitive Bias in Decision-Making with LLMs

MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?