Abstract:Large Language Models (LLMs) have become foundational in modern language-driven applications, profoundly influencing daily life. A critical technique in leveraging their potential is role-playing, where LLMs simulate diverse roles to enhance their real-world utility. However, while research has highlighted the presence of social biases in LLM outputs, it remains unclear whether and to what extent these biases emerge during role-playing scenarios. In this paper, we introduce BiasLens, a fairness testing framework designed to systematically expose biases in LLMs during role-playing. Our approach uses LLMs to generate 550 social roles across a comprehensive set of 11 demographic attributes, producing 33,000 role-specific questions targeting various forms of bias. These questions, spanning Yes/No, multiple-choice, and open-ended formats, are designed to prompt LLMs to adopt specific roles and respond accordingly. We employ a combination of rule-based and LLM-based strategies to identify biased responses, rigorously validated through human evaluation. Using the generated questions as the benchmark, we conduct extensive evaluations of six advanced LLMs released by OpenAI, Mistral AI, Meta, Alibaba, and DeepSeek. Our benchmark reveals 72,716 biased responses across the studied LLMs, with individual models yielding between 7,754 and 16,963 biased responses, underscoring the prevalence of bias in role-playing contexts. To support future research, we have publicly released the benchmark, along with all scripts and experimental results.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in role - playing situations, whether and to what extent social biases exist in large language models (LLMs). Specifically, researchers are concerned about whether LLMs will exhibit unfair or discriminatory behavior towards certain groups when simulating different roles. This issue is crucial because with the wide application of LLMs in fields such as finance, healthcare, law enforcement, education, and social decision - making, the biases in these models may exacerbate existing social inequalities and reinforce harmful social stereotypes in daily applications. ### Main contributions of the paper 1. **Custom - made test framework**: Introduced BiasLens, an automated fairness test framework specifically designed to detect social biases in LLMs during the role - playing process. 2. **Extensive empirical research**: Conducted a large - scale empirical evaluation of six advanced LLMs, using 33,000 questions generated by BiasLens, and revealed a total of 72,716 biased responses. 3. **Open benchmark and resources**: Released benchmark datasets, scripts, and experimental results to promote the adoption of BiasLens and further research. ### Workflow of the BiasLens framework 1. **Automatic test input generation**: - **Role generation**: Used GPT - 4o to generate 550 roles, covering 11 different sociodemographic attributes (such as ability, age, physical characteristics, etc.), which may have potential bias or discriminatory behavior. - **Question generation**: Generated 60 questions for each role, including Yes/No questions, multiple - choice questions, and open - ended questions, aiming to trigger the biased responses of LLMs when assuming these roles. 2. **Automatic test oracle generation**: - **Rule - based test oracle**: For Yes/No questions and multiple - choice questions, judged whether there was bias through rules. - **LLM - based test oracle**: For open - ended questions, used three LLMs as judges to evaluate whether the answers were biased and determined the final conclusion by the majority - vote principle. ### Experimental design and results Researchers tested six advanced LLMs from OpenAI, Mistral AI, Meta, Alibaba, and DeepSeek. Each question was asked three times to each LLM, and was only classified as biased when bias occurred in more than two answers. Despite the strict evaluation criteria, the benchmark test still found a total of 72,716 biased responses, indicating that in role - playing situations, LLMs do introduce additional social biases. ### Conclusion Through this research, the authors emphasized the importance of fairness testing for LLMs in role - playing situations and provided specific tools and methods to identify and reduce social biases in these models. This not only helps to improve the application quality of LLMs but also can reduce potential social inequalities and discriminatory behaviors in practical applications.

Benchmarking Bias in Large Language Models during Role-Playing

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Quantifying Risk Propensities of Large Language Models: Ethical Focus and Bias Detection through Role-Play

Quantifying Bias in Agentic Large Language Models: A Benchmarking Approach

Cognitive Bias in Decision-Making with LLMs

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models

Thinking Before Speaking: A Role-playing Model with Mindset

FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models

Evaluating Nuanced Bias in Large Language Model Free Response Answers

SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

CLIMB: A Benchmark of Clinical Bias in Large Language Models

Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment

SocialBench: Sociality Evaluation of Role-Playing Conversational Agents

On the Decision-Making Abilities in Role-Playing using Large Language Models

LangBiTe: A Platform for Testing Bias in Large Language Models

Social Bias Evaluation for Large Language Models Requires Prompt Variations

Language Models Show Stable Value Orientations Across Diverse Role-Plays