Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Chahat Raj,Anjishnu Mukherjee,Aylin Caliskan,Antonios Anastasopoulos,Ziwei Zhu
2024-07-02
Abstract:Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy. Our code and data are available at <a class="link-external link-https" href="https://github.com/chahatraj/breakingbias" rel="external noopener nofollow">this https URL</a>.
Computation and Language
What problem does this paper attempt to address?
This paper aims to address the issue of social bias present in large language models (LLMs). Specifically: 1. **Research Background and Objectives**: Large language models often inherit and propagate social biases from their training data, which can lead to the reinforcement of stereotypes and inequalities in society. This paper explores how the "contact hypothesis" can be used to assess and mitigate these biases. 2. **Methodology**: The authors simulated different forms of social interaction (i.e., positive or negative contact) to construct a dataset containing 108,000 prompts. They used this dataset to measure the performance of three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 dimensions of social bias. 3. **Main Contributions**: - Evaluated whether LLMs' responses to contact probes exhibit social bias; - Verified whether LLMs' responses align with the "contact hypothesis" in psychology, which posits that increased contact between different groups can reduce bias; - Proposed a new debiasing technique—Social Contact Debiasing (SCD), which uses instruction tuning to enable the model to generate unbiased responses. 4. **Experimental Results**: The study found that after one cycle of instruction tuning, the social bias in the LLaMA 2 model could be reduced by up to 40%. Additionally, the method's effectiveness and generalization ability were validated in various settings. In summary, this paper is dedicated to exploring how psychological principles can be utilized to alleviate social bias in LLMs and proposes a new method based on the "contact hypothesis."