BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM

Hanjun Luo,Haoyu Huang,Ziye Deng,Xuecheng Liu,Ruizhe Chen,Zuozhu Liu
2024-08-16
Abstract:Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability to generate high-quality images, which also raises concerns about the social biases in their outputs, especially in the human generation. Sociological research has established systematic classifications of bias. However, existing bias research about T2I models conflates different types of bias, impeding methodological progress. In this paper, we introduce BIGbench, a unified benchmark for Biases of Image Generation, featuring a meticulously designed dataset. Unlike existing benchmarks, BIGbench classifies and evaluates biases across four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes, which ensures exceptional accuracy for analysis. Furthermore, BIGbench applies advanced multi-modal large language models to achieve fully automated and highly accurate evaluations. We apply BIGbench to evaluate eight representative general T2I models and three debiased methods. Our human evaluation results underscore BIGbench's effectiveness in aligning images and identifying various biases. Besides, our study also reveal new research directions about biases, such as the effect of distillation and irrelevant protected attributes. Our benchmark is openly accessible at <a class="link-external link-https" href="https://github.com/BIGbench2024/BIGbench2024/" rel="external noopener nofollow">this https URL</a> to ensure reproducibility.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the issue of social bias in Text-to-Image (T2I) generation models. Specifically, researchers have found that existing T2I models exhibit significant and systematic social biases when generating images, especially those of humans. These biases are mainly reflected in: 1. **Confusion of Different Types of Bias**: Existing research often conflates different types of social biases, which hinders methodological progress. 2. **Insufficient Benchmarking**: Current bias evaluation benchmarks are limited in terms of the number of prompts, coverage, and model comparison, making it difficult to comprehensively assess bias. To address these issues, the research team proposed a unified bias benchmarking framework called BIGbench. This framework has the following features: - **Four-Dimensional Bias Classification System**: Biases are classified from four dimensions: manifestation (neglect and discrimination), visibility (implicit and explicit bias), acquired attributes (such as occupation, social relationships, etc.), and protected attributes (such as gender, race, age, etc.). - **Large-Scale Dataset**: Contains 47,040 prompts covering various aspects such as occupation, characteristics, and social relationships. - **Automated Evaluation**: Utilizes multimodal large language models to achieve fully automated and high-precision evaluation. - **Model Comparison**: Evaluates the effectiveness of eight representative T2I models and three debiasing methods, and validates the effectiveness of BIGbench through human evaluation. Through this benchmarking framework, researchers can not only intuitively compare the degree of bias between different models but also explore new research directions, such as the impact of distillation techniques and the role of irrelevant protected attributes.