Gender Bias Evaluation in Text-to-image Generation: A Survey

Yankun Wu,Yuta Nakashima,Noa Garcia
2024-08-21
Abstract:The rapid development of text-to-image generation has brought rising ethical considerations, especially regarding gender bias. Given a text prompt as input, text-to-image models generate images according to the prompt. Pioneering models such as Stable Diffusion and DALL-E 2 have demonstrated remarkable capabilities in producing high-fidelity images from natural language prompts. However, these models often exhibit gender bias, as studied by the tendency of generating man from prompts such as "a photo of a software developer". Given the widespread application and increasing accessibility of these models, bias evaluation is crucial for regulating the development of text-to-image generation. Unlike well-established metrics for evaluating image quality or fidelity, the evaluation of bias presents challenges and lacks standard approaches. Although biases related to other factors, such as skin tone, have been explored, gender bias remains the most extensively studied. In this paper, we review recent work on gender bias evaluation in text-to-image generation, involving bias evaluation setup, bias evaluation metrics, and findings and trends. We primarily focus on the evaluation of recent popular models such as Stable Diffusion, a diffusion model operating in the latent space and using CLIP text embedding, and DALL-E 2, a diffusion model leveraging Seq2Seq architectures like BART. By analyzing recent work and discussing trends, we aim to provide insights for future work.
Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the evaluation of gender bias in text - to - image generation models. Specifically, the rapid development of text - to - image generation technology has brought more and more ethical issues, especially gender bias issues. Given a text prompt as input, these models will generate images according to the prompt. However, research shows that these models tend to show gender bias when processing certain prompts (such as "a photo of a software developer"), and are inclined to generate male images. To solve this problem, this paper reviews recent research work on gender bias evaluation in text - to - image generation, mainly covering the following aspects: 1. **Bias evaluation settings**: including gender and bias definitions, prompt design, and attribute classification. 2. **Bias evaluation metrics**: divided into distribution metrics, bias - propensity metrics, and quality metrics. 3. **Findings and trends**: summarize the main findings of existing research and analyze the trends of current research. Through the review of these aspects, the author hopes to provide valuable insights for future research to improve the fairness of text - to - image generation models and reduce gender bias. ### Specific problem description - **Performance of gender bias**: For example, when using gender - neutral prompts (such as "a photo of a software developer"), the model is more likely to generate male images. - **Evaluation challenges**: Different from other established image quality or fidelity evaluation metrics, the evaluation of gender bias lacks standard methods. - **Wide application**: As the applications of these models become more and more widespread, bias evaluation is crucial for regulating their development. ### Main contributions of the paper - **Systematic review**: Conducted a comprehensive review of existing research, covering different models (such as Stable Diffusion and DALL - E 2) and evaluation methods. - **Proposing challenges**: Pointed out the challenges and deficiencies in current evaluation methods. - **Future directions**: Provided guidance and suggestions for future gender bias evaluation and mitigation methods. ### Key formulas and concepts - **Mean Absolute Deviation (MAD)**: \[ \text{MAD}=\frac{1}{n}\sum_{i = 1}^{n}|x_i-\bar{x}| \] where \(x_i\) is the detected attribute distribution and \(\bar{x}\) is the unbiased distribution. - **Chi - square test**: \[ \chi^2=\sum_{i = 1}^{k}\frac{(O_i - E_i)^2}{E_i} \] where \(O_i\) is the observed frequency and \(E_i\) is the expected frequency. - **Cosine Similarity**: \[ \text{Cosine Similarity}=\frac{\mathbf{A}\cdot\mathbf{B}}{\|\mathbf{A}\|\|\mathbf{B}\|} \] where \(\mathbf{A}\) and \(\mathbf{B}\) are two vectors. Through these methods and techniques, this paper aims to provide a comprehensive framework to help researchers better understand and evaluate gender bias in text - to - image generation models.