Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework

Jingling Li,Zeyu Tang,Xiaoyu Liu,Peter Spirtes,Kun Zhang,Liu Leqi,Yang Liu
2024-03-14
Abstract:Large language models (LLMs) can easily generate biased and discriminative responses. As LLMs tap into consequential decision-making (e.g., hiring and healthcare), it is of crucial importance to develop strategies to mitigate these biases. This paper focuses on social bias, tackling the association between demographic information and LLM outputs. We propose a causality-guided debiasing framework that utilizes causal understandings of (1) the data-generating process of the training corpus fed to LLMs, and (2) the internal reasoning process of LLM inference, to guide the design of prompts for debiasing LLM outputs through selection mechanisms. Our framework unifies existing de-biasing prompting approaches such as inhibitive instructions and in-context contrastive examples, and sheds light on new ways of debiasing by encouraging bias-free reasoning. Our strong empirical performance on real-world datasets demonstrates that our framework provides principled guidelines on debiasing LLM outputs even with only the black-box access.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of social bias in text generation by large language models (LLMs). As these models are increasingly applied in critical decision-making areas such as recruitment and healthcare, reducing the bias they produce becomes crucial. The authors propose a bias mitigation framework based on causal reasoning, utilizing causal understanding to guide the design of prompts to selectively adjust model outputs, thereby reducing the impact of social bias. Specifically, the framework combines existing debiasing techniques and demonstrates through empirical studies that this approach can effectively guide LLMs to generate unbiased responses even with only black-box access.