Abstract:In this work, we analyze the gender bias induced by BERT in downstream tasks. We also propose solutions to reduce gender bias. Contextual language models (CLMs) have pushed the NLP benchmarks to a new height. It has become a new norm to utilize CLM-provided word embeddings in downstream tasks such as text classification. However, unless addressed, CLMs are prone to learn intrinsic gender bias in the dataset. As a result, predictions of downstream NLP models can vary noticeably by varying gender words, such as replacing "he" to "she", or even gender-neutral words. In this paper, we focus our analysis on a popular CLM, i.e., \(\text {BERT}\). We analyze the gender bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. For each task, we train a simple regressor utilizing \(\text {BERT}\)'s word embeddings. We then evaluate the gender bias in regressors using an equity evaluation corpus. Ideally and from the specific design, the models should discard gender informative features from the input. However, the results show a significant dependence of the system's predictions on gender-particular words and phrases. We claim that such biases can be reduced by removing gender-specific features from word embedding. Hence, for each layer in BERT, we identify directions that primarily encode gender information. The space formed by such directions is referred to as the gender subspace in the semantic space of word embeddings. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. This obviates the need of realizing gender subspace in multiple dimensions and prevents other crucial information from being omitted. Experiments show that removing embedding components in gender directions achieves great success in reducing BERT-induced bias in the downstream tasks. The investigation reveals significant gender bias a contextualized language model ( i.e., \(\text {BERT}\)) induces in downstream tasks. The proposed solution seems promising in reducing such biases.

Are Male Candidates Better than Females? Debiasing BERT Resume Retrieval System.

Unmasking the Stereotypes: Evaluating Social Biases in Chinese BERT

Debiasing Gender Bias in Information Retrieval Models

AI Gender Bias, Disparities, and Fairness: Does Training Data Matter?

Gender Bias in BERT -- Measuring and Analysing Biases through Sentiment Rating in a Realistic Downstream Classification Task

Investigating Gender Bias in BERT

Projective Methods for Mitigating Gender Bias in Pre-trained Language Models

Does Debiasing Inevitably Degrade the Model Performance

Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias

Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

Gender Bias in Neural Natural Language Processing

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models

Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models

Interpretable bias mitigation for textual data: Reducing gender bias in patient notes while maintaining classification performance

Measuring Gender and Racial Biases in Large Language Models

An investigation of structures responsible for gender bias in BERT and DistilBERT

Measuring and Mitigating Gender Bias in Legal Contextualized Language Models

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

UnMASKed: Quantifying Gender Biases in Masked Language Models through Linguistically Informed Job Market Prompts

GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models