Abstract:To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence and nuances of stereotypes in LLM outputs. Toward this end, we present Marked Personas, a prompt-based method to measure stereotypes in LLMs for intersectional demographic groups without any lexicon or data labeling. Grounded in the sociolinguistic concept of markedness (which characterizes explicitly linguistically marked categories versus unmarked defaults), our proposed method is twofold: 1) prompting an LLM to generate personas, i.e., natural language descriptions, of the target demographic group alongside personas of unmarked, default groups; 2) identifying the words that significantly distinguish personas of the target group from corresponding unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts. The words distinguishing personas of marked (non-white, non-male) groups reflect patterns of othering and exoticizing these demographics. An intersectional lens further reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women. These representational harms have concerning implications for downstream applications like story generation.

What problem does this paper attempt to address?

The paper primarily focuses on the issues of social biases and stereotypes present in large language models (LLMs) and proposes a new method—Marked Personas—to measure these stereotypes in an unsupervised manner when describing different demographic groups. The core contributions of the paper include: 1. **Proposing the Marked Personas framework**: This is a prompt-based method that captures patterns and stereotypes in model outputs by generating natural language descriptions of specific demographic groups. This method does not require pre-constructed datasets or lexicons. 2. **Finding that model-generated personas contain more stereotypes**: The study found that personas generated by GPT-3.5 and GPT-4 contain more racial stereotypes compared to descriptions written by humans under the same prompts. 3. **Analyzing harmful patterns**: The paper provides a detailed analysis of stereotypes, essentializing narratives, clichés, and other harmful patterns in model outputs identified by the Marked Personas method but not captured by existing bias measurement methods. The paper first introduces background knowledge, including the sociological concept of "markedness" and previous methods for measuring bias and stereotypes in language models. It then explains the working principles of the Marked Personas method in detail, including how to generate personas and identify keywords that distinguish marked groups from unmarked groups (Marked Words). Experiments compare the differences between model-generated personas and human-written personas and discuss the limitations of existing stereotype lexicons. Finally, the paper reveals that even when model-generated descriptions have a positive emotional tone, there are still underlying harmful patterns such as othering and essentializing narratives. Additionally, the paper specifically explores unique harmful patterns that appear in intersectional groups.

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas

Personas with Attitudes: Controlling LLMs for Diverse Data Annotation

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

PERSONA: A Reproducible Testbed for Pluralistic Alignment

A Taxonomy of Stereotype Content in Large Language Models

Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes

Towards Auditing Large Language Models: Improving Text-based Stereotype Detection

Protected group bias and stereotypes in Large Language Models

LLMs Among Us: Generative AI Participating in Digital Discourse

On the steerability of large language models toward data-driven personas

An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models

Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption

Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis

Towards Understanding and Mitigating Social Biases in Language Models

Evaluating Large Language Model Biases in Persona-Steered Generation

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems