Abstract:Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at <a class="link-external link-https" href="https://github.com/kstanghere/GenderCARE-ccs24" rel="external noopener nofollow">this https URL</a>.

Multi-Dimensional Gender Bias Classification

Mitigating Gender Bias in Machine Learning Data Sets

Gender Bias in Text: Labeled Datasets and Lexicons

Identifying and Reducing Gender Bias in Word-Level Language Models

Exploration, detection, and mitigation: Unveiling gender bias in NLP

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle

Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture

Gender Bias in Neural Natural Language Processing

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Gender Bias in Large Language Models across Multiple Languages

Auditing Gender Analyzers on Text Data

Locating and Mitigating Gender Bias in Large Language Models

Easy Adaptation to Mitigate Gender Bias in Multilingual Text Classification

Measuring Gender and Racial Biases in Large Language Models

Mitigating Gender Bias in Natural Language Processing: Literature Review

Reducing Gender Bias in Abusive Language Detection

Gender Bias in Decision-Making with Large Language Models: A Study of Relationship Conflicts