Abstract:In this work, we analyze the gender bias induced by BERT in downstream tasks. We also propose solutions to reduce gender bias. Contextual language models (CLMs) have pushed the NLP benchmarks to a new height. It has become a new norm to utilize CLM-provided word embeddings in downstream tasks such as text classification. However, unless addressed, CLMs are prone to learn intrinsic gender bias in the dataset. As a result, predictions of downstream NLP models can vary noticeably by varying gender words, such as replacing "he" to "she", or even gender-neutral words. In this paper, we focus our analysis on a popular CLM, i.e., \(\text {BERT}\). We analyze the gender bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. For each task, we train a simple regressor utilizing \(\text {BERT}\)'s word embeddings. We then evaluate the gender bias in regressors using an equity evaluation corpus. Ideally and from the specific design, the models should discard gender informative features from the input. However, the results show a significant dependence of the system's predictions on gender-particular words and phrases. We claim that such biases can be reduced by removing gender-specific features from word embedding. Hence, for each layer in BERT, we identify directions that primarily encode gender information. The space formed by such directions is referred to as the gender subspace in the semantic space of word embeddings. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. This obviates the need of realizing gender subspace in multiple dimensions and prevents other crucial information from being omitted. Experiments show that removing embedding components in gender directions achieves great success in reducing BERT-induced bias in the downstream tasks. The investigation reveals significant gender bias a contextualized language model ( i.e., \(\text {BERT}\)) induces in downstream tasks. The proposed solution seems promising in reducing such biases.

Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation

Mitigating Gender Bias in Contextual Word Embeddings

Attenuating Bias in Word Vectors

Measuring Gender Bias in Word Embeddings of Gendered Languages Requires Disentangling Grammatical Gender Signals

Identifying and Mitigating Gender Bias in Hyperbolic Word Embeddings

Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

Identifying and Reducing Gender Bias in Word-Level Language Models

Projective Methods for Mitigating Gender Bias in Pre-trained Language Models

How Far Can It Go?: On Intrinsic Gender Bias Mitigation for Text Classification

Debiasing Word Embeddings with Nonlinear Geometry

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Examining Gender Bias in Languages with Grammatical Gender

Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

Investigating Gender Bias in BERT

A World Full of Stereotypes? Further Investigation on Origin and Gender Bias in Multi-Lingual Word Embeddings

Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs

Gender Bias in Contextualized Word Embeddings

A Causal Inference Method for Reducing Gender Bias in Word Embedding Relations