GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Kunsheng Tang,Wenbo Zhou,Jie Zhang,Aishan Liu,Gelei Deng,Shuai Li,Peigui Qi,Weiming Zhang,Tianwei Zhang,Nenghai Yu
2024-08-22
Abstract:Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at <a class="link-external link-https" href="https://github.com/kstanghere/GenderCARE-ccs24" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of gender bias present in large language models (LLMs). Specifically: - **Research Background**: Although large language models perform excellently in natural language generation, they also amplify existing societal gender biases, particularly against transgender and non-binary individuals. This bias not only affects the trustworthiness of the technology but also exacerbates harmful gender stereotypes, leading to inequalities in digital interactions. - **Current Challenges**: Existing methods for evaluating gender bias (such as template-based, phrase-based, and option-based methods) have made contributions but still have shortcomings, including a lack of transparency, susceptibility to changes in template structure, and failure to adequately consider transgender and non-binary groups. - **Main Objectives**: The paper proposes a comprehensive framework—GenderCARE, aimed at evaluating and reducing gender bias in LLMs by establishing unified standards. Specifically, it includes: - Establishing unified standards for gender equality benchmarks; - Constructing a gender bias evaluation benchmark that adheres to these standards; - Developing effective debiasing techniques to reduce gender bias without affecting the overall performance of the model. - **Specific Methods**: The paper introduces a new evaluation benchmark, GenderPair, which constructs datasets using a pairing method and combines counterfactual data augmentation with low-rank adaptation fine-tuning strategies to create debiased datasets. Additionally, a set of evaluation metrics is designed to quantify gender bias in model outputs at both lexical and semantic levels. Through this series of measures, the paper aims to promote the development of LLMs towards a more fair and inclusive direction, ensuring that AI systems in technological applications can reflect fairness and equality.