Standard Language Ideology in AI-Generated Language

Genevieve Smith,Eve Fleisig,Madeline Bossi,Ishita Rustagi,Xavier Yin
2024-06-13
Abstract:In this position paper, we explore standard language ideology in language generated by large language models (LLMs). First, we outline how standard language ideology is reflected and reinforced in LLMs. We then present a taxonomy of open problems regarding standard language ideology in AI-generated language with implications for minoritized language communities. We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most "appropriate" language. Finally, we discuss tensions that remain, including reflecting on what desirable system behavior looks like, as well as advantages and drawbacks of generative AI tools imitating--or often not--different English language varieties. Throughout, we discuss standard language ideology as a manifestation of existing global power structures in and through AI-generated language before ending with questions to move towards alternative, more emancipatory digital futures.
Computation and Language
What problem does this paper attempt to address?
This paper explores the issue of standard language ideology in large language models (LLMs). Standard language ideology refers to the preference and reinforcement of a standardized language, often associated with socially privileged groups who use the language. For example, in the United States, Standard American English (SAE) is considered the default language standard. This concept is reflected in AI-generated language, further reinforcing language hierarchies and creating inequalities for users of non-standard language variants such as African American English. The paper presents an open problem classification regarding the standard language ideology in AI-generated language and discusses its impact on different language communities worldwide. These include: 1. Default generation of "standard" language variants, which reinforces "correct" communication practices and may exacerbate people's inherent language biases. 2. Lower quality of service for minority language variants, resulting in more difficulties for users of these variants when using language models. 3. Stereotyping or depreciating content when generating minority language variants, inaccurately reflecting the richness of these language variants. 4. Appropriation or manipulation when generating minority language variants, particularly when non-native speakers imitate these variants inappropriately, resulting in the problem of "language mimicry" and inappropriate imitation of marginalized cultural groups. 5. Preventing the generation of minority language variants may result in lower service quality and even erasure of these languages, further reinforcing language hierarchies. The paper also highlights that the technicians and researchers involved in AI-generated language often come from specific social backgrounds, which may lead to biases and neglect towards certain language variants. The research suggests addressing and resolving these inequalities brought about by technology to support a more liberating digital future.