Abstract:Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve
This paper aims to explore how large language models (LLMs) internalize specific social identities when assigned to them, and the resulting ingroup favoritism and outgroup bias. Specifically, the paper uses Social Identity Theory (SIT) to explain that when LLMs are given a particular social identity, they distinguish between "us" (ingroup) and "them" (outgroup), and this self-categorization leads to ingroup favoritism and outgroup bias.
Existing research mainly focuses on ingroup favoritism while neglecting outgroup bias, which is the root cause of intergroup prejudice, hostility, and social exclusion. Therefore, this paper attempts to fill this research gap by demonstrating through experiments that outgroup bias is as strong as ingroup favoritism and proposes a method to mitigate inherent biases by guiding LLMs to adopt initially unpopular group perspectives.
### Main Research Objectives
1. **Identify Political Bias**: Evaluate the political bias of LLMs through a series of political value statements without specific identity prompts.
2. **Analyze the Impact of Role-Playing**: Observe the changes in attitudes towards ingroups and outgroups by assigning LLMs a Republican or Democrat identity.
3. **Evaluate Bias Mitigation Methods**: Explore the effectiveness of mitigating inherent biases by guiding LLMs to adopt the perspectives of unpopular groups.
### Experimental Design
- **Experimental Conditions**:
- **No Identity Baseline**: Observe the default attitudes of LLMs towards political value statements without assigning any specific identity.
- **Republican Identity**: Observe the attitudes of LLMs towards Republican and Democrat value statements by prompting "You are a Republican."
- **Democrat Identity**: Observe the attitudes of LLMs towards Republican and Democrat value statements by prompting "You are a Democrat."
- **Measurement Methods**:
- Use value statements from the political compass test to collect the degree of agreement or disagreement from LLMs for each statement.
- Encode the answers on a 6-point scale, from -3 (strongly disagree) to +3 (strongly agree).
### Main Findings
1. **Political Bias**:
- GPT-4o exhibits a strong liberal inclination and support for Democrat value statements under the no identity baseline condition, while showing moderate opposition to conservative and Republican value statements.
- This bias is also validated under other reference conditions (e.g., "human" and "politically independent" identities).
2. **Ingroup Favoritism**:
- When assigned a Republican identity, GPT-4o's support for Republican value statements significantly increases, showing strong ingroup favoritism.
- When assigned a Democrat identity, GPT-4o's support for Democrat value statements also significantly increases, but the ingroup favoritism is weaker.
3. **Outgroup Bias**:
- When assigned a Republican identity, GPT-4o's support for Democrat value statements significantly decreases, showing strong outgroup bias.
- When assigned a Democrat identity, GPT-4o's support for Republican value statements also significantly decreases, but the outgroup bias is weaker.
4. **Bias Mitigation**:
- By assigning the unpopular Republican identity, GPT-4o's ingroup favoritism and outgroup bias counterbalance each other, reducing the initial liberal inclination and anti-conservative bias.
- This method is more effective in correcting political bias compared to general debiasing instructions.
### Replication Study on Gender Bias
- **Experimental Setup**:
- Conduct similar experiments in the field of gender bias, including sample size, temperature settings, and response encoding.
- Observe the attitudes of LLMs towards gender stereotypes by assigning male or female identities.
- **Main Findings**:
- GPT-4o exhibits a strong preference for females and moderate preference for males under the no identity baseline condition.
- After assigning a male identity, GPT-4o's preference for females decreases, and preference for males increases.
- After assigning a female identity, GPT-4o's preference for males decreases, and preference for females increases.
### Conclusion
This paper demonstrates through experiments that outgroup bias in LLMs is as significant as ingroup favoritism.