Unboxing Occupational Bias: Grounded Debiasing of LLMs with U.S. Labor Data

Atmika Gorti,Manas Gaur,Aman Chadha
2024-08-27
Abstract:Large Language Models (LLMs) are prone to inheriting and amplifying societal biases embedded within their training data, potentially reinforcing harmful stereotypes related to gender, occupation, and other sensitive categories. This issue becomes particularly problematic as biased LLMs can have far-reaching consequences, leading to unfair practices and exacerbating social inequalities across various domains, such as recruitment, online content moderation, or even the criminal justice system. Although prior research has focused on detecting bias in LLMs using specialized datasets designed to highlight intrinsic biases, there has been a notable lack of investigation into how these findings correlate with authoritative datasets, such as those from the U.S. National Bureau of Labor Statistics (NBLS). To address this gap, we conduct empirical research that evaluates LLMs in a ``bias-out-of-the-box" setting, analyzing how the generated outputs compare with the distributions found in NBLS data. Furthermore, we propose a straightforward yet effective debiasing mechanism that directly incorporates NBLS instances to mitigate bias within LLMs. Our study spans seven different LLMs, including instructable, base, and mixture-of-expert models, and reveals significant levels of bias that are often overlooked by existing bias detection techniques. Importantly, our debiasing method, which does not rely on external datasets, demonstrates a substantial reduction in bias scores, highlighting the efficacy of our approach in creating fairer and more reliable LLMs.
Computation and Language
What problem does this paper attempt to address?
The paper primarily explores the issues of gender, racial, and religious biases in large language models (LLMs) when generating career recommendations and attempts to mitigate these biases using data from the U.S. National Bureau of Labor Statistics (NBLS). Specifically: 1. **Research Background and Objectives**: - The study finds that current LLMs tend to inherit and amplify social biases present in training data, particularly in career recommendations, potentially reinforcing gender and racial stereotypes. - These biases may lead to unfair practices, exacerbating social inequalities, especially in areas such as recruitment, online content moderation, and even the criminal justice system. 2. **Research Methods**: - The paper employs an "out-of-the-box" bias analysis framework to evaluate seven different LLMs (including instructable, foundational, and mixture of experts models) and compares them with NBLS data. - A simple yet effective debiasing mechanism based on NBLS instances is proposed to reduce biases in LLMs. 3. **Experimental Design**: - Two prompting methods were used: zero-shot prompting (ZSP) and few-shot prompting (FSP), and the models were tested through various task types (such as sentence completion, multiple-choice questions, etc.). - Debiasing prompt templates were designed to avoid stereotypical responses and encourage the generation of unbiased responses. 4. **Main Contributions**: - It was found that existing bias detection techniques often overlook some significant biases in LLMs. - The proposed debiasing method significantly reduced bias scores without relying on external datasets, demonstrating its effectiveness in creating fairer and more reliable LLMs. In summary, the paper aims to reveal and mitigate various social biases present in LLMs' career recommendations, thereby improving the fairness and reliability of these models.