Abstract:In this position paper, we explore standard language ideology in language generated by large language models (LLMs). First, we outline how standard language ideology is reflected and reinforced in LLMs. We then present a taxonomy of open problems regarding standard language ideology in AI-generated language with implications for minoritized language communities. We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language. Finally, we discuss tensions that remain, including reflecting on what desirable system behavior looks like, as well as advantages and drawbacks of generative AI tools imitating--or often not--different English language varieties. Throughout, we discuss standard language ideology as a manifestation of existing global power structures in and through AI-generated language before ending with questions to move towards alternative, more emancipatory digital futures.

What problem does this paper attempt to address?

This paper explores the issue of standard language ideology in large language models (LLMs). Standard language ideology refers to the preference and reinforcement of a standardized language, often associated with socially privileged groups who use the language. For example, in the United States, Standard American English (SAE) is considered the default language standard. This concept is reflected in AI-generated language, further reinforcing language hierarchies and creating inequalities for users of non-standard language variants such as African American English. The paper presents an open problem classification regarding the standard language ideology in AI-generated language and discusses its impact on different language communities worldwide. These include: 1. Default generation of "standard" language variants, which reinforces "correct" communication practices and may exacerbate people's inherent language biases. 2. Lower quality of service for minority language variants, resulting in more difficulties for users of these variants when using language models. 3. Stereotyping or depreciating content when generating minority language variants, inaccurately reflecting the richness of these language variants. 4. Appropriation or manipulation when generating minority language variants, particularly when non-native speakers imitate these variants inappropriately, resulting in the problem of "language mimicry" and inappropriate imitation of marginalized cultural groups. 5. Preventing the generation of minority language variants may result in lower service quality and even erasure of these languages, further reinforcing language hierarchies. The paper also highlights that the technicians and researchers involved in AI-generated language often come from specific social backgrounds, which may lead to biases and neglect towards certain language variants. The research suggests addressing and resolving these inequalities brought about by technology to support a more liberating digital future.

Standard Language Ideology in AI-Generated Language

Large Language Models Reflect the Ideology of their Creators

Laissez-Faire Harms: Algorithmic Biases in Generative Language Models

Large Language Model Soft Ideologization via AI-Self-Consciousness

Do language models practice what they preach? Examining language ideologies about gendered language reform encoded in LLMs

Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Language Model Agents

Words of Wisdom: Representational Harms in Learning From AI Communication

The global landscape of academic guidelines for generative AI and Large Language Models

HOW LANGUAGE GAPS CONSTRAIN GENERATIVE AI DEVELOPMENT

Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria

Generative Language Models Exhibit Social Identity Biases

Is open source software culture enough to make AI a common ?

The Psychosocial Impacts of Generative AI Harms

A global AI community requires language-diverse publishing

Structured like a language model: Analysing AI as an automated subject

Collective Constitutional AI: Aligning a Language Model with Public Input

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

Applying Standards to Advance Upstream & Downstream Ethics in Large Language Models

Exploring the Role of Generative AI in Enhancing Language Learning: Opportunities and Challenges

Ethical Considerations and Policy Implications for Large Language Models: Guiding Responsible Development and Deployment