Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models

Takashi Nakano,Kazumasa Shimari,Raula Gaikovina Kula,Christoph Treude,Marc Cheong,Kenichi Matsumoto
2024-09-19
Abstract:Large Language Models (LLMs) have taken the world by storm, demonstrating their ability not only to automate tedious tasks, but also to show some degree of proficiency in completing software engineering tasks. A key concern with LLMs is their "black-box" nature, which obscures their internal workings and could lead to societal biases in their outputs. In the software engineering context, in this early results paper, we empirically explore how well LLMs can automate recruitment tasks for a geographically diverse software team. We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team, analyzing a total of 3,657 profiles over a five-year period (2019-2023). Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles (counterfactuals). Furthermore, ChatGPT was more likely to assign certain developer roles to users from a specific country, revealing an implicit bias. Overall, this study reveals insights into the inner workings of LLMs and has implications for mitigating such societal biases in these models.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the possible geographical and social bias in the recruitment process of software teams by large language models (LLMs). Specifically, the researchers are concerned with: 1. **Location Bias**: When selecting team members from candidates in different regions, will the LLM show a preference for certain regions? For example, is it more likely to choose developers from specific countries or regions? 2. **Team Role Bias**: Does the LLM have bias when assigning roles to team members? For example, is it more likely to assign certain roles to developers from specific countries? 3. **Counterfactual of Location**: When the geographical location information of candidates is changed, will the LLM's recruitment decision be affected? For example, if the location of a candidate is changed from one country to another, can the LLM detect this change and adjust its decision? To verify these problems, the researchers used OpenAI's ChatGPT model and conducted experiments based on GitHub user profiles. They analyzed 3,657 user profiles from four different regions (the United States, India, Nigeria, and Poland). Through these experiments, the researchers hope to reveal the potential biases of LLMs in the recruitment process and propose future research directions to mitigate these biases. ### Specific Research Questions - **RQ1**: How does the LLM select team members according to different geographical locations? Will the model show a preference for certain regions? - **RQ2**: How does the LLM assign roles within the team? Does the model have bias when assigning roles? - **RQ3**: How does the geographical location of candidates affect the LLM's recruitment decision? Can the model detect changes in location information? ### Research Background and Motivation With the wide application of large language models in the field of software engineering, especially in the recruitment process, it is particularly important to understand and evaluate whether these models have social biases. The researchers hope to reveal the internal working principles of LLMs through this research and provide guidance to mitigate these biases, thereby ensuring the fairness and diversity of the recruitment process. ### Experimental Design The researchers selected four geographical regions (the United States, India, Nigeria, and Poland) and collected user profiles created between 2019 and 2023 through the GitHub API. 100 user profiles were randomly selected from each region to form a balanced data set. Then, the researchers used the ChatGPT model to conduct a series of experiments, simulating the recruitment process of a six - person software development team, and recorded the model's selection results and role - assignment situations. ### Main Findings - **RQ1**: The results show that the LLM does have geographical bias when selecting team members. For example, candidates from Nigeria and Poland are selected more frequently, while candidates from the United States are selected less frequently. - **RQ2**: In terms of role assignment, the LLM shows obvious preferences. For example, Americans are more likely to be assigned as data scientists, while Nigerians are more likely to be assigned as software engineers. - **RQ3**: When the geographical location of candidates is changed, the LLM's recruitment decision changes significantly. For example, when the location of candidates from other regions is changed to the United States, their selection rate increases significantly. ### Conclusions and Future Research Directions The researchers point out that although LLMs perform well in automated tasks, they may have serious social biases in the recruitment process. Future research should further explore how to adjust these models to mitigate biases and propose effective strategies to promote diversity and inclusiveness.