Abstract:Hundreds of millions of people now interact with language models, with uses ranging from serving as a writing aid to informing hiring decisions. Yet these language models are known to perpetuate systematic racial prejudices, making their judgments biased in problematic ways about groups like African Americans. While prior research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement. By contrast, the language models' overt stereotypes about African Americans are much more positive. We demonstrate that dialect prejudice has the potential for harmful consequences by asking language models to make hypothetical decisions about people, based only on how they speak. Language models are more likely to suggest that speakers of African American English be assigned less prestigious jobs, be convicted of crimes, and be sentenced to death. Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level. Our findings have far-reaching implications for the fair and safe employment of language technology.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the implicit dialect prejudice in language models, especially the prejudice against African American English (AAE) speakers. This kind of prejudice may lead to systemic racial discrimination when language models make decisions regarding people's personalities, employability, and criminal behavior. Specifically, the researchers verified the following points through a series of experiments: 1. **Implicit Racial Prejudice in Language Models**: The study found that language models will show prejudice similar to the most negative racial stereotypes in human history when processing AAE, and these stereotypes are even more negative than the most negative ones in human experimental records. This prejudice is mainly manifested as negative evaluations of the intelligence, credibility, etc. of AAE speakers. 2. **Differences between Implicit and Explicit Prejudice**: The study also found that language models are relatively positive in their explicit evaluations of African Americans, but there is a significant negative prejudice at the implicit level. This difference is particularly obvious in language models trained with human feedback. These models have learned to superficially conceal their racial prejudice, but the deep - seated prejudice still exists. 3. **The Impact of Prejudice**: The study demonstrated the possible practical consequences of this implicit prejudice through simulation experiments. For example, when matching jobs based on dialect characteristics, language models are more likely to assign AAE speakers to jobs with lower prestige; in simulated court judgments, language models are more likely to sentence AAE speakers to death. 4. **The Irresolvability of Prejudice**: The study further explored the effectiveness of existing prejudice - mitigation methods and found that increasing the model size or adding human feedback during training cannot effectively mitigate implicit dialect prejudice. Instead, it may widen the gap between explicit and implicit prejudice. Overall, this paper reveals the implicit dialect prejudice in language models and explores its potential harm and the limitations of existing mitigation methods. This finding is of great significance for ensuring the fairness and security of language technologies.

Dialect prejudice predicts AI decisions about people's character, employability, and criminality

AI generates covertly racist decisions about people based on their dialect

Evaluating and Mitigating Discrimination in Language Model Decisions

Generative Language Models Exhibit Social Identity Biases

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Measuring Gender and Racial Biases in Large Language Models

Chatbot AI makes racist judgements on the basis of dialect

"Im not Racist but...": Discovering Bias in the Internal Knowledge of Large Language Models

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models

Racial Bias in Hate Speech and Abusive Language Detection Datasets

LLMs produce racist output when prompted in African American English

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Artificial Intelligence in mental health and the biases of language based models

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness

Echoes of Biases: How Stigmatizing Language Affects AI Performance

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption