Abstract:This study investigates whether popular LLMs exhibit bias towards elite universities when generating personas for technology industry professionals. We employed a novel persona-based approach to compare the educational background predictions of GPT-3.5, Gemini, and Claude 3 Sonnet with actual data from LinkedIn. The study focused on various roles at Microsoft, Meta, and Google, including VP Product, Director of Engineering, and Software Engineer. We generated 432 personas across the three LLMs and analyzed the frequency of elite universities (Stanford, MIT, UC Berkeley, and Harvard) in these personas compared to LinkedIn data. Results showed that LLMs significantly overrepresented elite universities, featuring these universities 72.45% of the time, compared to only 8.56% in the actual LinkedIn data. ChatGPT 3.5 exhibited the highest bias, followed by Claude Sonnet 3, while Gemini performed best. This research highlights the need to address educational bias in LLMs and suggests strategies for mitigating such biases in AI-driven recruitment processes.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: When generating portraits of professionals in the technology industry, do large language models (LLMs) have a bias towards elite universities? Specifically, the research aims to answer the following three questions: 1. **When generating portraits of professionals in the technology industry, do large language models (LLMs) have a bias towards elite universities?** 2. **Are all large language models (LLMs) equally biased towards elite universities?** 3. **Does the bias of large language models (LLMs) towards elite universities exist in different occupational levels in the technology industry?** ### Research Background With the wide application of large language models such as GPT - 3.5, Gemini, and Claude Sonnet 3 in natural language processing tasks, people are increasingly concerned about various biases in these models. These biases may stem from unbalanced training data or social stereotypes, which in turn affect the output of the models. Especially when these models are used in the recruitment process, biases may lead to unfair recruitment practices and limit the opportunities for excellent talents from different educational backgrounds. ### Research Methods The researchers adopted a portrait - based method and compared the educational background predictions of professionals in the technology industry generated by three large language models (GPT - 3.5, Gemini, and Claude Sonnet 3) with the actual data on LinkedIn. The specific steps are as follows: 1. **Data Collection**: Collected the real educational background data of employees from Microsoft, Meta, and Google on LinkedIn. 2. **Portrait Generation**: Through the prompt engineering method, 432 portraits were generated using the three large language models, covering six different roles (software engineer, product manager, engineering director, etc.). 3. **Bias Assessment**: Calculated the frequency of elite universities (Stanford, MIT, University of California, Berkeley, and Harvard) in the portraits generated by the models and compared it with the actual data on LinkedIn. ### Main Findings - **Overall Bias**: Large language models significantly overestimate the proportion of elite universities. In the portraits generated by the models, 72.45% of the educational backgrounds are from elite universities, while in the actual LinkedIn data, this proportion is only 8.56%. - **Model Differences**: ChatGPT 3.5 shows the highest bias (116.67%), followed by Claude Sonnet 3 (72.92%), and Gemini performs the best (27.78%). - **Occupational Level Analysis**: Bias is widespread in different occupational levels, and ChatGPT 3.5 shows the strongest bias in all levels. ### Conclusion The research shows that large language models do have a bias towards elite universities when generating portraits of professionals in the technology industry, and this bias varies among different models and occupational levels. The research results emphasize the need to take measures to reduce these biases in the AI - driven recruitment process to ensure fairness and inclusiveness. ### Formula Explanation To quantify the bias, the following formulas were used in the research: - **Baseline Indicator \(M_b\)**: \[ M_b=\frac{N_e}{P_t} \] where \(N_e\) is the number of occurrences of elite universities in a specific role/occupational level, and \(P_t\) is the total number of members in a specific role/occupational level. - **Evaluation Indicator \(M_e^*\)**: \[ M_e^*=\frac{N_e^*}{P_t^*} \] where \(N_e^*\) is the number of occurrences of elite universities in the portraits generated by the model, and \(P_t^*\) is the total number of portraits in a specific role/occupational level. By comparing \(M_e^*\) and \(M_b\), it can be determined whether the model is biased. If \(M_e^*\gg M_b\), it indicates the presence of bias; if \(M_e^*\approx M_b\), it indicates that there is no significant bias.

Evaluation of LLMs Biases Towards Elite Universities: A Persona-Based Exploration

Popular LLMs Amplify Race and Gender Disparities in Human Mobility

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

LLMs are Biased Teachers: Evaluating LLM Bias in Personalized Education

Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Revealing Hidden Bias in AI: Lessons from Large Language Models

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Evaluating LLMs for Gender Disparities in Notable Persons

Evaluation of Bias Towards Medical Professionals in Large Language Models

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

Gender Bias in LLM-generated Interview Responses

Assessing Gender Bias in LLMs: Comparing LLM Outputs with Human Perceptions and Official Statistics

With a Grain of SALT: Are LLMs Fair Across Social Dimensions?

PersonaLLM: Investigating the Ability of GPT-3.5 to Express Personality Traits and Gender Differences

Exploring Changes in Nation Perception with Nationality-Assigned Personas in LLMs

Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models