Abstract:Counterfactual Data Augmentation (CDA) has been one of the preferred techniques for mitigating gender bias in natural language models. CDA techniques have mostly employed word substitution based on dictionaries. Although such dictionary-based CDA techniques have been shown to significantly improve the mitigation of gender bias, in this paper, we highlight some limitations of such dictionary-based counterfactual data augmentation techniques, such as susceptibility to ungrammatical compositions, and lack of generalization outside the set of predefined dictionary words. Model-based solutions can alleviate these problems, yet the lack of qualitative parallel training data hinders development in this direction. Therefore, we propose a combination of data processing techniques and a bi-objective training regime to develop a model-based solution for generating counterfactuals to mitigate gender bias. We implemented our proposed solution and performed an empirical evaluation which shows how our model alleviates the shortcomings of dictionary-based solutions.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing dictionary - based counterfactual data augmentation (CDA) techniques in alleviating gender bias in natural language models. Specifically, these problems include: 1. **Grammatical incoherence**: Dictionary - based methods are prone to generating sentences with grammatical errors because they usually rely on simple word - replacement rules without considering the context. 2. **Limited generalization ability**: These methods rely heavily on predefined dictionary words and cannot handle new words outside the dictionary, thus limiting their generalization ability. 3. **Lack of high - quality parallel data**: Model - based methods require a large amount of parallel data for training, but such data is relatively scarce, hindering the development of model - based CDA techniques. To overcome these problems, the author proposes a model - based counterfactual data generator (MBCDA), which combines data processing techniques and a dual - objective training strategy to generate higher - quality counterfactual texts and more effectively alleviate gender bias. Specific contributions include: 1. **Data processing pipeline**: Used to generate high - quality parallel data from the output of dictionary - based CDA. 2. **Dual - objective training model**: By introducing a generator and a discriminator, it is ensured that the generated counterfactual texts are not only grammatically correct but also can effectively change the gender association of the original text. Through these innovations, the author hopes to provide a more robust and effective solution to address the deficiencies of existing dictionary - based CDA methods.

Model-based Counterfactual Generator for Gender Bias Mitigation

FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP

Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns

Reducing Gender Bias in Machine Translation through Counterfactual Data Generation

Addressing Both Statistical and Causal Gender Fairness in NLP Models

Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model

Improving Classifier Robustness through Active Generation of Pairwise Counterfactuals

Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in Multilingual Machine Translation

Gender Bias in Neural Natural Language Processing

Targeted Data Augmentation for bias mitigation

How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs?

Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

On Evaluating and Mitigating Gender Biases in Multilingual Settings

Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling

Mitigating Social Biases of Pre-trained Language Models via Contrastive Self-Debiasing with Double Data Augmentation

The Birth of Bias: A case study on the evolution of gender bias in an English language model

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

Mitigating Large Language Model Bias: Automated Dataset Augmentation and Prejudice Quantification

Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias