Abstract:Generative language models are transforming our digital ecosystem, but they often inherit societal biases, for instance stereotypes associating certain attributes with specific identity groups. While whether and how these biases are mitigated may depend on the specific use cases, being able to effectively detect instances of stereotype perpetuation is a crucial first step. Current methods to assess presence of stereotypes in generated language rely on simple template or co-occurrence based measures, without accounting for the variety of sentential contexts they manifest in. We argue that understanding the sentential context is crucial for detecting instances of generalization. We distinguish two types of generalizations: (1) language that merely mentions the presence of a generalization ("people think the French are very rude"), and (2) language that reinforces such a generalization ("as French they must be rude"), from non-generalizing context ("My French friends think I am rude"). For meaningful stereotype evaluations, we need to reliably distinguish such instances of generalizations. We introduce the new task of detecting generalization in language, and build GeniL, a multilingual dataset of over 50K sentences from 9 languages (English, Arabic, Bengali, Spanish, French, Hindi, Indonesian, Malay, and Portuguese) annotated for instances of generalizations. We demonstrate that the likelihood of a co-occurrence being an instance of generalization is usually low, and varies across different languages, identity groups, and attributes. We build classifiers to detect generalization in language with an overall PR-AUC of 58.7, with varying degrees of performance across languages. Our research provides data and tools to enable a nuanced understanding of stereotype perpetuation, a crucial step towards more inclusive and responsible language technologies.

SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes

Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

GeniL: A Multilingual Dataset on Generalizing Language

Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach

GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text

Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context

SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration

Towards Auditing Large Language Models: Improving Text-based Stereotype Detection

Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes

Uncovering and Quantifying Social Biases in Code Generation

Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language Models