Abstract:Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs' behavior with respect to gender stereotypes, a known issue for prior models. We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias, a commonly used gender bias dataset, which is likely to be included in the training data of current LLMs. We test four recently published LLMs and demonstrate that they express biased assumptions about men and women's occupations. Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person's gender; (b) these choices align with people's perceptions better than with the ground truth as reflected in official job statistics; (c) LLMs in fact amplify the bias beyond what is reflected in perceptions or the ground truth; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize the ambiguity; (e) LLMs provide explanations for their choices that are factually inaccurate and likely obscure the true reason behind their predictions. That is, they provide rationalizations of their biased behavior. This highlights a key property of these models: LLMs are trained on imbalanced datasets; as such, even with the recent successes of reinforcement learning with human feedback, they tend to reflect those imbalances back at us. As with other types of societal biases, we suggest that LLMs must be carefully tested to ensure that they treat minoritized individuals and communities equitably.

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model

Are Models Biased on Text without Gender-related Language?

Gender bias and stereotypes in Large Language Models

Wikigender: A Machine Learning Model to Detect Gender Bias in Wikipedia

On Evaluating and Mitigating Gender Biases in Multilingual Settings

Gender Bias in Neural Natural Language Processing

Identifying and Reducing Gender Bias in Word-Level Language Models

Mitigating Gender Bias in Contextual Word Embeddings

Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

Towards Understanding and Mitigating Social Biases in Language Models

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Reducing Gender Bias in Abusive Language Detection

Learning from Red Teaming: Gender Bias Provocation and Mitigation in Large Language Models

Locating and Mitigating Gender Bias in Large Language Models