Abstract:Natural Language Processing (NLP) models have been found discriminative against groups of different social identities such as gender and race. With the negative consequences of these undesired biases, researchers have responded with unprecedented effort and proposed promising approaches for bias mitigation. In spite of considerable practical importance, current algorithmic fairness literature lacks an in-depth understanding of the relations between different forms of biases. Social bias is complex by nature. Numerous studies in social psychology identify the "generalized prejudice", i.e., generalized devaluing sentiments across different groups. For example, people who devalue ethnic minorities are also likely to devalue women and gays. Therefore, this work aims to provide a first systematic study toward understanding bias correlations in mitigation. In particular, we examine bias mitigation in two common NLP tasks -- toxicity detection and word embeddings -- on three social identities, i.e., race, gender, and religion. Our findings suggest that biases are correlated and present scenarios in which independent debiasing approaches dominant in current literature may be insufficient. We further investigate whether jointly mitigating correlated biases is more desired than independent and individual debiasing. Lastly, we shed light on the inherent issue of debiasing-accuracy trade-off in bias mitigation. This study serves to motivate future research on joint bias mitigation that accounts for correlated biases.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to explore the correlations between different forms of social biases in natural language processing (NLP) models and propose joint bias - mitigation strategies. Specifically, the paper focuses on the following issues: 1. **Are biases of different social identities related?** - The research finds that different forms of biases (such as gender, race, and religious biases) are correlated in toxicity detection and word - embedding tasks. For example, reducing gender bias may also reduce race and religious biases simultaneously. 2. **Is the joint bias - mitigation strategy superior to the independent bias - mitigation strategy?** - Through experiments, the paper finds that the joint bias - mitigation strategy is superior to the independent bias - mitigation strategy in reducing the total bias. The joint strategy not only considers the bias correlations between different social identities but also better captures the unique bias characteristics of each identity. 3. **Is it necessary to balance accuracy and de - biasing effects during the joint bias - mitigation process?** - The experimental results show that there is an inherent trade - off between de - biasing and prediction performance during the training process. Although the joint bias - mitigation strategy can reduce biases and improve prediction performance, the imbalance in data distribution further exacerbates this trade - off. ### Main contributions - **Systematically study bias correlations**: For the first time, systematically study the correlations between different forms of biases in NLP tasks, providing comprehensive quantitative and qualitative analyses. - **Propose a joint bias - mitigation strategy**: Design a joint bias - mitigation strategy for multiple social identities, effectively reducing the total bias. - **Explore the trade - off between de - biasing and accuracy**: Reveal the trade - off problem between de - biasing and prediction performance during the joint bias - mitigation process, providing directions for future research. ### Experimental settings and results #### 1. Toxicity detection task - **Dataset**: Use the Jigsaw dataset, which contains 403,957 samples labeled with toxicity and social identity information. - **Comparison methods**: - Baseline model (Biased) - Models for de - biasing a single social identity (Gender, Race, Religion) - Models for de - biasing two social identities simultaneously (Ge + Ra, Ge + Re, Ra + Re) - Model for de - biasing three social identities simultaneously (Joint) - **Evaluation metrics**: - AUC, micro - F1, accuracy (Acc.) - Individual bias metric - Joint bias metric - **Main results**: - Different forms of biases are positively correlated, and the independent de - biasing models still have certain effects in reducing the total bias. - The joint bias - mitigation strategy performs best in reducing the total bias, especially in race and religious biases. - There is a trade - off between de - biasing and prediction performance, especially in the case of unbalanced data distribution. #### 2. Word - embedding task - **Data source**: Use the L2 - reddit corpus, which contains Reddit posts and comments from American users. - **Initial bias word - embedding**: Obtained by training word2vec on approximately 56 million sentences. - **Evaluation tasks**: - Quantify the changes in the biases of the other two social identities after removing the bias of one social identity (using the MAC metric). - Evaluate the effect of joint de - biasing. - **Main results**: - After removing gender bias, race and religious biases are also reduced, indicating that different forms of biases are correlated. - The effect of sequential de - biasing (first removing gender bias, then removing race and religious biases) is better than that of independent de - biasing. ### Conclusion This paper, through systematically studying the correlations between different forms of biases in NLP tasks, proposes an effective joint bias - mitigation strategy and reveals the trade - off problem between de - biasing and prediction performance. These findings provide important references and directions for future research.

Toward Understanding Bias Correlations for Mitigation in NLP

Towards Understanding and Mitigating Social Biases in Language Models

On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection

Mitigating Gender Bias in Natural Language Processing: Literature Review

Mitigating Gender Bias in Contextual Word Embeddings

Protecting marginalized communities by mitigating discrimination in toxic language detection

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

A Comprehensive Empirical Study of Bias Mitigation Methods for Machine Learning Classifiers

Controlling Bias Exposure for Fair Interpretable Predictions

Bias and Fairness in Large Language Models: A Survey

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

Debiasing Word Embeddings with Nonlinear Geometry

Mitigating Biases in Toxic Language Detection Through Invariant Rationalization

Projective Methods for Mitigating Gender Bias in Pre-trained Language Models

Exploration, detection, and mitigation: Unveiling gender bias in NLP

Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness

Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey

Mitigating Social Biases in Text-to-Image Diffusion Models Via Linguistic-Aligned Attention Guidance

Gender Bias in Neural Natural Language Processing