Abstract:Large Language Models (LLMs) have taken the world by storm, demonstrating their ability not only to automate tedious tasks, but also to show some degree of proficiency in completing software engineering tasks. A key concern with LLMs is their "black-box" nature, which obscures their internal workings and could lead to societal biases in their outputs. In the software engineering context, in this early results paper, we empirically explore how well LLMs can automate recruitment tasks for a geographically diverse software team. We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team, analyzing a total of 3,657 profiles over a five-year period (2019-2023). Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles (counterfactuals). Furthermore, ChatGPT was more likely to assign certain developer roles to users from a specific country, revealing an implicit bias. Overall, this study reveals insights into the inner workings of LLMs and has implications for mitigating such societal biases in these models.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the possible geographical and social bias in the recruitment process of software teams by large language models (LLMs). Specifically, the researchers are concerned with: 1. **Location Bias**: When selecting team members from candidates in different regions, will the LLM show a preference for certain regions? For example, is it more likely to choose developers from specific countries or regions? 2. **Team Role Bias**: Does the LLM have bias when assigning roles to team members? For example, is it more likely to assign certain roles to developers from specific countries? 3. **Counterfactual of Location**: When the geographical location information of candidates is changed, will the LLM's recruitment decision be affected? For example, if the location of a candidate is changed from one country to another, can the LLM detect this change and adjust its decision? To verify these problems, the researchers used OpenAI's ChatGPT model and conducted experiments based on GitHub user profiles. They analyzed 3,657 user profiles from four different regions (the United States, India, Nigeria, and Poland). Through these experiments, the researchers hope to reveal the potential biases of LLMs in the recruitment process and propose future research directions to mitigate these biases. ### Specific Research Questions - **RQ1**: How does the LLM select team members according to different geographical locations? Will the model show a preference for certain regions? - **RQ2**: How does the LLM assign roles within the team? Does the model have bias when assigning roles? - **RQ3**: How does the geographical location of candidates affect the LLM's recruitment decision? Can the model detect changes in location information? ### Research Background and Motivation With the wide application of large language models in the field of software engineering, especially in the recruitment process, it is particularly important to understand and evaluate whether these models have social biases. The researchers hope to reveal the internal working principles of LLMs through this research and provide guidance to mitigate these biases, thereby ensuring the fairness and diversity of the recruitment process. ### Experimental Design The researchers selected four geographical regions (the United States, India, Nigeria, and Poland) and collected user profiles created between 2019 and 2023 through the GitHub API. 100 user profiles were randomly selected from each region to form a balanced data set. Then, the researchers used the ChatGPT model to conduct a series of experiments, simulating the recruitment process of a six - person software development team, and recorded the model's selection results and role - assignment situations. ### Main Findings - **RQ1**: The results show that the LLM does have geographical bias when selecting team members. For example, candidates from Nigeria and Poland are selected more frequently, while candidates from the United States are selected less frequently. - **RQ2**: In terms of role assignment, the LLM shows obvious preferences. For example, Americans are more likely to be assigned as data scientists, while Nigerians are more likely to be assigned as software engineers. - **RQ3**: When the geographical location of candidates is changed, the LLM's recruitment decision changes significantly. For example, when the location of candidates from other regions is changed to the United States, their selection rate increases significantly. ### Conclusions and Future Research Directions The researchers point out that although LLMs perform well in automated tasks, they may have serious social biases in the recruitment process. Future research should further explore how to adjust these models to mitigate biases and propose effective strategies to promote diversity and inclusiveness.

Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

The Silicon Ceiling: Auditing GPT's Race and Gender Biases in Hiring

Computer says 'no': Exploring systemic bias in ChatGPT using an audit approach

Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes

ChatGPT and large language models in academia: opportunities and challenges

Evaluating Gender, Racial, and Age Biases in Large Language Models: A Comparative Analysis of Occupational and Crime Scenarios

Revealing Hidden Bias in AI: Lessons from Large Language Models

Measuring Gender and Racial Biases in Large Language Models

Where Are Large Language Models for Code Generation on GitHub?

"You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Do Large Language Models Discriminate in Hiring Decisions on the Basis of Race, Ethnicity, and Gender?

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

AI AI Bias: Large Language Models Favor Their Own Generated Content

LLM4DS: Evaluating Large Language Models for Data Science Code Generation

Evaluation of Large Language Models: STEM education and Gender Stereotypes

Auditing the Use of Language Models to Guide Hiring Decisions

Whose ChatGPT? Unveiling Real-World Educational Inequalities Introduced by Large Language Models