Abstract:This study investigates whether popular LLMs exhibit bias towards elite universities when generating personas for technology industry professionals. We employed a novel persona-based approach to compare the educational background predictions of GPT-3.5, Gemini, and Claude 3 Sonnet with actual data from LinkedIn. The study focused on various roles at Microsoft, Meta, and Google, including VP Product, Director of Engineering, and Software Engineer. We generated 432 personas across the three LLMs and analyzed the frequency of elite universities (Stanford, MIT, UC Berkeley, and Harvard) in these personas compared to LinkedIn data. Results showed that LLMs significantly overrepresented elite universities, featuring these universities 72.45% of the time, compared to only 8.56% in the actual LinkedIn data. ChatGPT 3.5 exhibited the highest bias, followed by Claude Sonnet 3, while Gemini performed best. This research highlights the need to address educational bias in LLMs and suggests strategies for mitigating such biases in AI-driven recruitment processes.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: When generating portraits of professionals in the technology industry, do large language models (LLMs) have a bias towards elite universities? Specifically, the research aims to answer the following three questions:
1. **When generating portraits of professionals in the technology industry, do large language models (LLMs) have a bias towards elite universities?**
2. **Are all large language models (LLMs) equally biased towards elite universities?**
3. **Does the bias of large language models (LLMs) towards elite universities exist in different occupational levels in the technology industry?**
### Research Background
With the wide application of large language models such as GPT - 3.5, Gemini, and Claude Sonnet 3 in natural language processing tasks, people are increasingly concerned about various biases in these models. These biases may stem from unbalanced training data or social stereotypes, which in turn affect the output of the models. Especially when these models are used in the recruitment process, biases may lead to unfair recruitment practices and limit the opportunities for excellent talents from different educational backgrounds.
### Research Methods
The researchers adopted a portrait - based method and compared the educational background predictions of professionals in the technology industry generated by three large language models (GPT - 3.5, Gemini, and Claude Sonnet 3) with the actual data on LinkedIn. The specific steps are as follows:
1. **Data Collection**: Collected the real educational background data of employees from Microsoft, Meta, and Google on LinkedIn.
2. **Portrait Generation**: Through the prompt engineering method, 432 portraits were generated using the three large language models, covering six different roles (software engineer, product manager, engineering director, etc.).
3. **Bias Assessment**: Calculated the frequency of elite universities (Stanford, MIT, University of California, Berkeley, and Harvard) in the portraits generated by the models and compared it with the actual data on LinkedIn.
### Main Findings
- **Overall Bias**: Large language models significantly overestimate the proportion of elite universities. In the portraits generated by the models, 72.45% of the educational backgrounds are from elite universities, while in the actual LinkedIn data, this proportion is only 8.56%.
- **Model Differences**: ChatGPT 3.5 shows the highest bias (116.67%), followed by Claude Sonnet 3 (72.92%), and Gemini performs the best (27.78%).
- **Occupational Level Analysis**: Bias is widespread in different occupational levels, and ChatGPT 3.5 shows the strongest bias in all levels.
### Conclusion
The research shows that large language models do have a bias towards elite universities when generating portraits of professionals in the technology industry, and this bias varies among different models and occupational levels. The research results emphasize the need to take measures to reduce these biases in the AI - driven recruitment process to ensure fairness and inclusiveness.
### Formula Explanation
To quantify the bias, the following formulas were used in the research:
- **Baseline Indicator \(M_b\)**:
\[
M_b=\frac{N_e}{P_t}
\]
where \(N_e\) is the number of occurrences of elite universities in a specific role/occupational level, and \(P_t\) is the total number of members in a specific role/occupational level.
- **Evaluation Indicator \(M_e^*\)**:
\[
M_e^*=\frac{N_e^*}{P_t^*}
\]
where \(N_e^*\) is the number of occurrences of elite universities in the portraits generated by the model, and \(P_t^*\) is the total number of portraits in a specific role/occupational level.
By comparing \(M_e^*\) and \(M_b\), it can be determined whether the model is biased. If \(M_e^*\gg M_b\), it indicates the presence of bias; if \(M_e^*\approx M_b\), it indicates that there is no significant bias.