BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM

Hanjun Luo,Haoyu Huang,Ziye Deng,Xuecheng Liu,Ruizhe Chen,Zuozhu Liu

2024-08-16

Abstract:Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability to generate high-quality images, which also raises concerns about the social biases in their outputs, especially in the human generation. Sociological research has established systematic classifications of bias. However, existing bias research about T2I models conflates different types of bias, impeding methodological progress. In this paper, we introduce BIGbench, a unified benchmark for Biases of Image Generation, featuring a meticulously designed dataset. Unlike existing benchmarks, BIGbench classifies and evaluates biases across four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes, which ensures exceptional accuracy for analysis. Furthermore, BIGbench applies advanced multi-modal large language models to achieve fully automated and highly accurate evaluations. We apply BIGbench to evaluate eight representative general T2I models and three debiased methods. Our human evaluation results underscore BIGbench's effectiveness in aligning images and identifying various biases. Besides, our study also reveal new research directions about biases, such as the effect of distillation and irrelevant protected attributes. Our benchmark is openly accessible at <a class="link-external link-https" href="https://github.com/BIGbench2024/BIGbench2024/" rel="external noopener nofollow">this https URL</a> to ensure reproducibility.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of social bias in Text-to-Image (T2I) generation models. Specifically, researchers have found that existing T2I models exhibit significant and systematic social biases when generating images, especially those of humans. These biases are mainly reflected in: 1. **Confusion of Different Types of Bias**: Existing research often conflates different types of social biases, which hinders methodological progress. 2. **Insufficient Benchmarking**: Current bias evaluation benchmarks are limited in terms of the number of prompts, coverage, and model comparison, making it difficult to comprehensively assess bias. To address these issues, the research team proposed a unified bias benchmarking framework called BIGbench. This framework has the following features: - **Four-Dimensional Bias Classification System**: Biases are classified from four dimensions: manifestation (neglect and discrimination), visibility (implicit and explicit bias), acquired attributes (such as occupation, social relationships, etc.), and protected attributes (such as gender, race, age, etc.). - **Large-Scale Dataset**: Contains 47,040 prompts covering various aspects such as occupation, characteristics, and social relationships. - **Automated Evaluation**: Utilizes multimodal large language models to achieve fully automated and high-precision evaluation. - **Model Comparison**: Evaluates the effectiveness of eight representative T2I models and three debiasing methods, and validates the effectiveness of BIGbench through human evaluation. Through this benchmarking framework, researchers can not only intuitively compare the degree of bias between different models but also explore new research directions, such as the impact of distillation techniques and the role of irrelevant protected attributes.

BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM

Quantifying Bias in Text-to-Image Generative Models

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

Discovering Biases in Image Datasets with the Crowd

GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models

Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary

OpenBias: Open-set Bias Detection in Text-to-Image Generative Models

Mitigating Social Biases in Text-to-Image Diffusion Models Via Linguistic-Aligned Attention Guidance

New Job, New Gender? Measuring the Social Bias in Image Generation Models

FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

On the Cultural Gap in Text-to-Image Generation

Stable Bias: Analyzing Societal Representations in Diffusion Models

Exploring Social Bias in Downstream Applications of Text-to-Image Foundation Models

Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models

Gender Bias Evaluation in Text-to-image Generation: A Survey

Bias Begets Bias: The Impact of Biased Embeddings on Diffusion Models

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models