Abstract:Large Language Models (LLMs) have shown powerful performance and development prospects and are widely deployed in the real world. However, LLMs can capture social biases from unprocessed training data and propagate the biases to downstream tasks. Unfair LLM systems have undesirable social impacts and potential harms. In this paper, we provide a comprehensive review of related research on fairness in LLMs. Considering the influence of parameter magnitude and training paradigm on research strategy, we divide existing fairness research into oriented to medium-sized LLMs under pre-training and fine-tuning paradigms and oriented to large-sized LLMs under prompting paradigms. First, for medium-sized LLMs, we introduce evaluation metrics and debiasing methods from the perspectives of intrinsic bias and extrinsic bias, respectively. Then, for large-sized LLMs, we introduce recent fairness research, including fairness evaluation, reasons for bias, and debiasing methods. Finally, we discuss and provide insight on the challenges and future directions for the development of fairness in LLMs.

What problem does this paper attempt to address?

This paper attempts to address the fairness issues in large - language models (LLMs). Specifically, the paper focuses on how to evaluate and mitigate the social biases exhibited by these models when processing natural languages. Social biases may originate from unprocessed training data and be transmitted to downstream tasks through embedded representations, leading to unfair results. For example, some models may make discriminatory or stereotypical judgments about specific genders, races, or other social groups. ### Main problems in the paper: 1. **Propagation of social biases**: Social biases captured by large - language models from unprocessed training data will spread to downstream tasks, thus affecting the fairness of decision - making. 2. **Unfair social impacts**: Biased language models may lead to discriminatory, stereotypical, and demeaning decisions against vulnerable or marginalized groups, causing adverse social impacts and potential harm. 3. **Challenges of different training paradigms**: With the increase in the number of model parameters and the emergence of new training paradigms (such as the prompt paradigm), researchers need to address different fairness challenges. The paper divides the existing fairness research into the pre - training and fine - tuning paradigms for medium - sized language models, and the prompt paradigm for large - language models. ### Solutions in the paper: - **Evaluation metrics**: The paper introduces various metrics for evaluating internal and external biases, including similarity - based metrics, probability - based metrics, etc. - **Debiasing methods**: For medium - sized language models, the paper discusses internal and external debiasing methods; for large - language models, the paper explores fairness evaluation, bias causes, and debiasing methods in the prompt paradigm. - **Future directions**: The paper also discusses the current challenges and future research directions, providing valuable insights for subsequent research. ### Formula explanations: Some formulas involved in the text are used to quantify the degree of bias. For example: - Formula for internal bias: \[ |E_i(z) - E_i(z')| > \epsilon_i \] where \( E_i(\cdot) \) is the internal bias evaluation metric, \( z = M(x) \) and \( z' = M(x') \) are the embedded representations of samples \( x \) and \( x' \) representing different demographic groups respectively, and \( \epsilon_i \) is the desired fairness threshold. - Formula for external bias: \[ |E_e(y) - E_e(y')| > \epsilon_e \] where \( E_e(\cdot) \) is the external bias evaluation metric, \( y = C(M'(x)) \) and \( y' = C(M'(x')) \) are the prediction outputs of the fine - tuned model for samples \( x \) and \( x' \) respectively, and \( \epsilon_e \) is the desired fairness threshold. In conclusion, this paper aims to comprehensively review and analyze the fairness issues in large - language models, provide effective methods for evaluating and mitigating biases, and point out the direction for future fairness research.

A Survey on Fairness in Large Language Models

Fairness in Large Language Models: A Taxonomic Survey

Bias and Fairness in Large Language Models: A Survey

Fairness in Large Language Models in Three Hours

Fairness Definitions in Language Models Explained

A Study of Implicit Ranking Unfairness in Large Language Models

A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions

Exploring Accuracy-Fairness Trade-off in Large Language Models

Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications

Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey

Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models

Investigating Implicit Bias in Large Language Models: A Large-Scale Study of Over 50 LLMs

LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models

Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts

Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One

People's Perceptions Toward Bias and Related Concepts in Large Language Models: A Systematic Review

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective