Understanding Intrinsic Socioeconomic Biases in Large Language Models

Mina Arzaghi,Florian Carichon,Golnoosh Farnadi
2024-05-29
Abstract:Large Language Models (LLMs) are increasingly integrated into critical decision-making processes, such as loan approvals and visa applications, where inherent biases can lead to discriminatory outcomes. In this paper, we examine the nuanced relationship between demographic attributes and socioeconomic biases in LLMs, a crucial yet understudied area of fairness in LLMs. We introduce a novel dataset of one million English sentences to systematically quantify socioeconomic biases across various demographic groups. Our findings reveal pervasive socioeconomic biases in both established models such as GPT-2 and state-of-the-art models like Llama 2 and Falcon. We demonstrate that these biases are significantly amplified when considering intersectionality, with LLMs exhibiting a remarkable capacity to extract multiple demographic attributes from names and then correlate them with specific socioeconomic biases. This research highlights the urgent necessity for proactive and robust bias mitigation techniques to safeguard against discriminatory outcomes when deploying these powerful models in critical real-world applications.
Computation and Language,Computers and Society,Machine Learning
What problem does this paper attempt to address?
This paper aims to address the issue of inherent socioeconomic biases in large language models (LLMs) during critical decision-making processes. Specifically, the researchers focus on the potential discriminatory outcomes these models exhibit when handling important matters such as loan approvals and visa applications. To systematically quantify socioeconomic biases among different demographic groups, the authors introduce a new dataset containing 1 million English sentences and use this dataset to evaluate the widespread socioeconomic biases present in several mainstream LLMs, including GPT-2, Llama 2, and Falcon. The study finds that these biases are significantly amplified when considering demographic intersectionality, meaning that LLMs can extract multiple demographic attributes from names and associate them with specific socioeconomic biases. Therefore, the research emphasizes the urgency of adopting proactive and effective bias mitigation techniques to prevent discriminatory outcomes when deploying these powerful models in real-world applications. In summary, this paper fills a critical gap in current research regarding how demographic factors influence and exacerbate harmful socioeconomic biases in LLMs.