Abstract:Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to explore how large language models (LLMs) internalize specific social identities when assigned to them, and the resulting ingroup favoritism and outgroup bias. Specifically, the paper uses Social Identity Theory (SIT) to explain that when LLMs are given a particular social identity, they distinguish between "us" (ingroup) and "them" (outgroup), and this self-categorization leads to ingroup favoritism and outgroup bias. Existing research mainly focuses on ingroup favoritism while neglecting outgroup bias, which is the root cause of intergroup prejudice, hostility, and social exclusion. Therefore, this paper attempts to fill this research gap by demonstrating through experiments that outgroup bias is as strong as ingroup favoritism and proposes a method to mitigate inherent biases by guiding LLMs to adopt initially unpopular group perspectives. ### Main Research Objectives 1. **Identify Political Bias**: Evaluate the political bias of LLMs through a series of political value statements without specific identity prompts. 2. **Analyze the Impact of Role-Playing**: Observe the changes in attitudes towards ingroups and outgroups by assigning LLMs a Republican or Democrat identity. 3. **Evaluate Bias Mitigation Methods**: Explore the effectiveness of mitigating inherent biases by guiding LLMs to adopt the perspectives of unpopular groups. ### Experimental Design - **Experimental Conditions**: - **No Identity Baseline**: Observe the default attitudes of LLMs towards political value statements without assigning any specific identity. - **Republican Identity**: Observe the attitudes of LLMs towards Republican and Democrat value statements by prompting "You are a Republican." - **Democrat Identity**: Observe the attitudes of LLMs towards Republican and Democrat value statements by prompting "You are a Democrat." - **Measurement Methods**: - Use value statements from the political compass test to collect the degree of agreement or disagreement from LLMs for each statement. - Encode the answers on a 6-point scale, from -3 (strongly disagree) to +3 (strongly agree). ### Main Findings 1. **Political Bias**: - GPT-4o exhibits a strong liberal inclination and support for Democrat value statements under the no identity baseline condition, while showing moderate opposition to conservative and Republican value statements. - This bias is also validated under other reference conditions (e.g., "human" and "politically independent" identities). 2. **Ingroup Favoritism**: - When assigned a Republican identity, GPT-4o's support for Republican value statements significantly increases, showing strong ingroup favoritism. - When assigned a Democrat identity, GPT-4o's support for Democrat value statements also significantly increases, but the ingroup favoritism is weaker. 3. **Outgroup Bias**: - When assigned a Republican identity, GPT-4o's support for Democrat value statements significantly decreases, showing strong outgroup bias. - When assigned a Democrat identity, GPT-4o's support for Republican value statements also significantly decreases, but the outgroup bias is weaker. 4. **Bias Mitigation**: - By assigning the unpopular Republican identity, GPT-4o's ingroup favoritism and outgroup bias counterbalance each other, reducing the initial liberal inclination and anti-conservative bias. - This method is more effective in correcting political bias compared to general debiasing instructions. ### Replication Study on Gender Bias - **Experimental Setup**: - Conduct similar experiments in the field of gender bias, including sample size, temperature settings, and response encoding. - Observe the attitudes of LLMs towards gender stereotypes by assigning male or female identities. - **Main Findings**: - GPT-4o exhibits a strong preference for females and moderate preference for males under the no identity baseline condition. - After assigning a male identity, GPT-4o's preference for females decreases, and preference for males increases. - After assigning a female identity, GPT-4o's preference for males decreases, and preference for females increases. ### Conclusion This paper demonstrates through experiments that outgroup bias in LLMs is as significant as ingroup favoritism.

Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption

I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large Language Models

Generative Language Models Exhibit Social Identity Biases

Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans

Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks

Understanding Intrinsic Socioeconomic Biases in Large Language Models

Large Language Models Show Human-like Social Desirability Biases in Survey Responses

People's Perceptions Toward Bias and Related Concepts in Large Language Models: A Systematic Review

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

Protected group bias and stereotypes in Large Language Models

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas

Evaluating Large Language Model Biases in Persona-Steered Generation

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

From Bytes to Biases: Investigating the Cultural Self-Perception of Large Language Models

Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations

On the steerability of large language models toward data-driven personas

Social Debiasing for Fair Multi-modal LLMs

Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models

Subtle Biases Need Subtler Measures: Dual Metrics for Evaluating Representative and Affinity Bias in Large Language Models