Abstract:With the starting point that implicit human biases are reflected in the statistical regularities of language, it is possible to measure biases in English static word embeddings. State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears. Current methods measure pre-defined social and intersectional biases that appear in particular contexts defined by sentence templates. Dispensing with templates, we introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models by incorporating a random-effects model. Experiments on social and intersectional biases show that CEAT finds evidence of all tested biases and provides comprehensive information on the variance of effect magnitudes of the same bias in different contexts. All the models trained on English corpora that we study contain biased representations. Furthermore, we develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings in addition to measuring them in contextualized word embeddings. We present the first algorithmic bias detection findings on how intersectional group members are strongly associated with unique emergent biases that do not overlap with the biases of their constituent minority identities. IBD and EIBD achieve high accuracy when detecting the intersectional and emergent biases of African American females and Mexican American females. Our results indicate that biases at the intersection of race and gender associated with members of multiple minority groups, such as African American females and Mexican American females, have the highest magnitude across all neural language models.

ML-EAT: A Multilevel Embedding Association Test for Interpretable and Transparent Social Science

TESTSGD: Interpretable Testing of Neural Networks Against Subtle Group Discrimination.

Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

No Word Embedding Model Is Perfect: Evaluating the Representation Accuracy for Social Bias in the Media

Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models

Measuring Social Biases in Grounded Vision and Language Embeddings

HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection

Detecting and Mitigating Indirect Stereotypes in Word Embeddings

This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language Models

EXTRACT: Explainable Transparent Control of Bias in Embeddings

Model-Agnostic Adaptive Testing for Intelligent Education Systems Via Meta-learned Gradient Embeddings

"Thy algorithm shalt not bear false witness": An Evaluation of Multiclass Debiasing Methods on Word Embeddings

Evaluating and Mitigating Discrimination in Language Model Decisions

ConceptDrift: Uncovering Biases through the Lens of Foundation Models

Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective

Evaluating Biased Attitude Associations of Language Models in an Intersectional Context

Cognitive Bias in Decision-Making with LLMs

EASL: A Framework for Designing, Implementing, and Evaluating ML Solutions in Clinical Healthcare Settings

Negative Associations in Word Embeddings Predict Anti-black Bias across Regions-but Only via Name Frequency