Measuring Gender and Racial Biases in Large Language Models

Jiafu An,Difang Huang,Chen Lin,Mingzhu Tai

2024-03-22

Abstract:In traditional decision making processes, social biases of human decision makers can lead to unequal economic outcomes for underrepresented social groups, such as women, racial or ethnic minorities. Recently, the increasing popularity of Large language model based artificial intelligence suggests a potential transition from human to AI based decision making. How would this impact the distributional outcomes across social groups? Here we investigate the gender and racial biases of OpenAIs GPT, a widely used LLM, in a high stakes decision making setting, specifically assessing entry level job candidates from diverse social groups. Instructing GPT to score approximately 361000 resumes with randomized social identities, we find that the LLM awards higher assessment scores for female candidates with similar work experience, education, and skills, while lower scores for black male candidates with comparable qualifications. These biases may result in a 1 or 2 percentage point difference in hiring probabilities for otherwise similar candidates at a certain threshold and are consistent across various job positions and subsamples. Meanwhile, we also find stronger pro female and weaker anti black male patterns in democratic states. Our results demonstrate that this LLM based AI system has the potential to mitigate the gender bias, but it may not necessarily cure the racial bias. Further research is needed to comprehend the root causes of these outcomes and develop strategies to minimize the remaining biases in AI systems. As AI based decision making tools are increasingly employed across diverse domains, our findings underscore the necessity of understanding and addressing the potential unequal outcomes to ensure equitable outcomes across social groups.

General Economics

What problem does this paper attempt to address?

The problem this paper attempts to address is whether large language models (LLMs) like OpenAI's GPT exhibit gender and racial biases when evaluating candidates from different social groups in high-stakes decision-making scenarios, and how these biases affect hiring decisions. Specifically, the researchers explore the following questions by generating a large number of random resumes and using GPT to score them: 1. **Do gender and racial identities affect GPT's evaluation scores of resumes?** 2. **Are these biases consistent across different job types and regions?** 3. **Do these biases lead to significant differences in the likelihood of candidates from different social groups being hired?** The research findings indicate that GPT does exhibit certain gender and racial biases when evaluating resumes, which may result in generally higher scores for female candidates and lower scores for Black male candidates. These differences are relatively consistent across different job types and regions and may lead to a 1-2 percentage point difference in hiring probabilities in some cases. Additionally, the study found that these biases manifest differently in Democratic and Republican states, with a stronger "pro-female" bias and a weaker "anti-Black male" bias in Democratic states. Overall, this paper aims to reveal and quantify the social biases of large language models in high-stakes decision-making, providing empirical evidence for understanding and addressing these biases.

Measuring Gender and Racial Biases in Large Language Models

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Gender Bias in Large Language Models across Multiple Languages

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Gender bias and stereotypes in Large Language Models

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models

Revealing Hidden Bias in AI: Lessons from Large Language Models

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

Public Perceptions of Gender Bias in Large Language Models: Cases of ChatGPT and Ernie

Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?

AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?

Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes

Unveiling Gender Bias in Large Language Models: Using Teacher's Evaluation in Higher Education As an Example

ChatGPT Exhibits Gender and Racial Biases in Acute Coronary Syndrome Management

Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT

"You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

Gender Bias of LLM in Economics: An Existentialism Perspective

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs