Abstract:Although machine learning algorithms demonstrate impressive performance, their trustworthiness remains a critical issue, particularly concerning fairness when implemented in real-world applications. Many notions of group fairness aim to minimize disparities in performance across protected groups. However, it can inadvertently reduce performance in certain groups, leading to sub-optimal outcomes. In contrast, Min-max group fairness notion prioritizes the improvement for the worst-performing group, thereby advocating a utility-promoting approach to fairness. However, it has been proven that existing efforts to achieve Min-max fairness exhibit limited effectiveness. In response to this challenge, we leverage the recently proposed "Neural Collapse'' framework to re-examine Empirical Risk Minimization (ERM) training, specifically investigating the root causes of poor performance in minority groups. The layer-peeled model is employed to decompose a network into two parts: an encoder to learn latent representation, and a subsequent classifier, with a systematic characterization of their training behaviors being conducted. Our analysis reveals that while classifiers achieve maximum separation, the separability of representations is insufficient, particularly for minority groups. This indicates the sub-optimal performance in minority groups stems from less separable representations, rather than classifiers. To tackle this issue, we introduce a novel strategy that incorporates a frozen classifier to directly enhance representation. Furthermore, we introduce two easily implemented loss functions to guide the learning process. The experimental assessments carried out on real-world benchmark datasets spanning the domains of Computer Vision, Natural Language Processing, and Tabular data demonstrate that our approach outperforms existing state-of-the-art methods in promoting the Min-max fairness notion.

Collapsed Language Models Promote Fairness

Your fairness may vary: Pretrained language model fairness in toxic text classification

Editable Fairness: Fine-Grained Bias Mitigation in Language Models

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

Fair Enough: Standardizing Evaluation and Model Selection for Fairness Research in NLP

Fine-tuning a Biased Model for Improving Fairness

Fairness in Language Models Beyond English: Gaps and Challenges

Neural Collapse Inspired Debiased Representation Learning for Min-max Fairness

The Fair Language Model Paradox

A Survey on Fairness in Large Language Models

Mitigate Extrinsic Social Bias in Pre-trained Language Models Via Continuous Prompts Adjustment

Fairness Definitions in Language Models Explained

A Trip Towards Fairness: Bias and De-Biasing in Large Language Models

Biases Mitigation and Expressiveness Preservation in Language Models: A Comprehensive Pipeline (student Abstract)

fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

Towards Understanding and Mitigating Social Biases in Language Models

Can Model Compression Improve NLP Fairness

FairFix: Enhancing Fairness of Pre-Trained Deep Neural Networks with Scarce Data Resources

Bias Amplification: Language Models as Increasingly Biased Media

Model and Evaluation: Towards Fairness in Multilingual Text Classification