Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Chahat Raj,Anjishnu Mukherjee,Aylin Caliskan,Antonios Anastasopoulos,Ziwei Zhu

2024-07-02

Abstract:Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy. Our code and data are available at <a class="link-external link-https" href="https://github.com/chahatraj/breakingbias" rel="external noopener nofollow">this https URL</a>.

Computation and Language

What problem does this paper attempt to address?

This paper aims to address the issue of social bias present in large language models (LLMs). Specifically: 1. **Research Background and Objectives**: Large language models often inherit and propagate social biases from their training data, which can lead to the reinforcement of stereotypes and inequalities in society. This paper explores how the "contact hypothesis" can be used to assess and mitigate these biases. 2. **Methodology**: The authors simulated different forms of social interaction (i.e., positive or negative contact) to construct a dataset containing 108,000 prompts. They used this dataset to measure the performance of three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 dimensions of social bias. 3. **Main Contributions**: - Evaluated whether LLMs' responses to contact probes exhibit social bias; - Verified whether LLMs' responses align with the "contact hypothesis" in psychology, which posits that increased contact between different groups can reduce bias; - Proposed a new debiasing technique—Social Contact Debiasing (SCD), which uses instruction tuning to enable the model to generate unbiased responses. 4. **Experimental Results**: The study found that after one cycle of instruction tuning, the social bias in the LLaMA 2 model could be reduced by up to 40%. Additionally, the method's effectiveness and generalization ability were validated in various settings. In summary, this paper is dedicated to exploring how psychological principles can be utilized to alleviate social bias in LLMs and proposes a new method based on the "contact hypothesis."

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Cognitive Bias in Decision-Making with LLMs

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

A Multi-LLM Debiasing Framework

Social Debiasing for Fair Multi-modal LLMs

Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data

Bias and Fairness in Large Language Models: A Survey

Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework

Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Measuring Implicit Bias in Explicitly Unbiased Large Language Models

Interpreting Bias in Large Language Models: A Feature-Based Approach

Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models

White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

Towards Understanding and Mitigating Social Biases in Language Models

A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions

Different Bias Under Different Criteria: Assessing Bias in LLMs with a Fact-Based Approach

Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models

How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making?

Understanding Intrinsic Socioeconomic Biases in Large Language Models