Abstract:Pre-trained language models trained on large-scale data have learned serious levels of social biases. Consequently, various methods have been proposed to debias pre-trained models. Debiasing methods need to mitigate only discriminatory bias information from the pre-trained models, while retaining information that is useful for the downstream tasks. In previous research, whether useful information is retained has been confirmed by the performance of downstream tasks in debiased pre-trained models. On the other hand, it is not clear whether these benchmarks consist of data pertaining to social biases and are appropriate for investigating the impact of debiasing. For example in gender-related social biases, data containing female words (e.g. ``she, female, woman''), male words (e.g. ``he, male, man''), and stereotypical words (e.g. ``nurse, doctor, professor'') are considered to be the most affected by debiasing. If there is not much data containing these words in a benchmark dataset for a target task, there is the possibility of erroneously evaluating the effects of debiasing. In this study, we compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets that containing female, male, and stereotypical words. Experiments show that the effects of debiasing are consistently \emph{underestimated} across all tasks. Moreover, the effects of debiasing could be reliably evaluated by separately considering instances containing female, male, and stereotypical words than all of the instances in a benchmark dataset.

Gender-preserving Debiasing for Pre-trained Word Embeddings

Debiasing Embeddings for Reduced Gender Bias in Text Classification

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

Mitigating Gender Bias in Contextual Word Embeddings

Gender Bias in Meta-Embeddings

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Nurse is Closer to Woman than Surgeon? Mitigating Gender-Biased Proximities in Word Embeddings

Attenuating Bias in Word Vectors

Evaluating Bias In Dutch Word Embeddings

Projective Methods for Mitigating Gender Bias in Pre-trained Language Models

Detecting and Mitigating Indirect Stereotypes in Word Embeddings

"Thy algorithm shalt not bear false witness": An Evaluation of Multiclass Debiasing Methods on Word Embeddings

Debiasing Gender Bias in Information Retrieval Models

General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token Level

Controlling Bias Exposure for Fair Interpretable Predictions

Mitigating Gender Stereotypes in Hindi and Marathi

Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Does Debiasing Inevitably Degrade the Model Performance

Debiasing Word Embeddings Improves Multimodal Machine Translation