Abstract:The rapid deployment of artificial intelligence (AI) models demands a thorough investigation of biases and risks inherent in these models to understand their impact on individuals and society. This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale. It focuses on 93 stigmatized groups in the United States, including a wide range of conditions related to disease, disability, drug use, mental illness, religion, sexuality, socioeconomic status, and other relevant factors. We investigate bias against these groups in English pre-trained Masked Language Models (MLMs) and their downstream sentiment classification tasks. To evaluate the presence of bias against 93 stigmatized conditions, we identify 29 non-stigmatized conditions to conduct a comparative analysis. Building upon a psychology scale of social rejection, the Social Distance Scale, we prompt six MLMs: RoBERTa-base, RoBERTa-large, XLNet-large, BERTweet-base, BERTweet-large, and DistilBERT. We use human annotations to analyze the predicted words from these models, with which we measure the extent of bias against stigmatized groups. When prompts include stigmatized conditions, the probability of MLMs predicting negative words is approximately 20 percent higher than when prompts have non-stigmatized conditions. In the sentiment classification tasks, when sentences include stigmatized conditions related to diseases, disability, education, and mental illness, they are more likely to be classified as negative. We also observe a strong correlation between bias in MLMs and their downstream sentiment classifiers (r =0.79). The evidence indicates that MLMs and their downstream sentiment classification tasks exhibit biases against socially stigmatized groups.

Analysis and Mitigation of Religion Bias in Indonesian Natural Language Processing Datasets

Do the Right Thing, Just Debias! Multi-Category Bias Mitigation Using LLMs

Toward Understanding Bias Correlations for Mitigation in NLP

Towards Understanding and Mitigating Social Biases in Language Models

An Analysis of Social Biases Present in BERT Variants Across Multiple Languages

BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization

On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection

Comparing Biases and the Impact of Multilingual Training across Multiple Languages

Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives

Mitigating Large Language Model Bias: Automated Dataset Augmentation and Prejudice Quantification

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models

Towards Debiasing NLU Models from Unknown Biases

Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models

The "Colonial Impulse" of Natural Language Processing: An Audit of Bengali Sentiment Analysis Tools and Their Identity-based Biases

Covert Bias: The Severity of Social Views' Unalignment in Language Models Towards Implicit and Explicit Opinion

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

Enhancing the fairness of offensive memes detection models by mitigating unintended political bias

Interpreting Bias in Large Language Models: A Feature-Based Approach

A Trip Towards Fairness: Bias and De-Biasing in Large Language Models