Abstract:Recent work has identified substantial disparities in generated images of different geographic regions, including stereotypical depictions of everyday objects like houses and cars. However, existing measures for these disparities have been limited to either human evaluations, which are time-consuming and costly, or automatic metrics evaluating full images, which are unable to attribute these disparities to specific parts of the generated images. In this work, we introduce a new set of metrics, Decomposed Indicators of Disparities in Image Generation (Decomposed-DIG), that allows us to separately measure geographic disparities in the depiction of objects and backgrounds in generated images. Using Decomposed-DIG, we audit a widely used latent diffusion model and find that generated images depict objects with better realism than backgrounds and that backgrounds in generated images tend to contain larger regional disparities than objects. We use Decomposed-DIG to pinpoint specific examples of disparities, such as stereotypical background generation in Africa, struggling to generate modern vehicles in Africa, and unrealistically placing some objects in outdoor settings. Informed by our metric, we use a new prompting structure that enables a 52% worst-region improvement and a 20% average improvement in generated background diversity.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the significant difference in the generation of images by text - to - image generation models in different geographical regions. Specifically, these differences are reflected in the stereotypical depictions of everyday objects such as houses and cars. Existing evaluation methods either rely on time - consuming and expensive manual evaluations or use automatic metrics that cannot attribute differences to specific parts of the generated images. Therefore, the authors introduce a new set of metrics - Decomposed - DIG (Decomposed - Image Generation Difference Indicator) - to measure the geographical differences of objects and backgrounds in generated images respectively. ### Main contributions of the paper 1. **Introduction of new metrics**: Proposed Decomposed - DIG, which can evaluate the geographical differences of objects and backgrounds in generated images respectively. 2. **Discovery of specific problems**: Through Decomposed - DIG, researchers found that the objects in the generated images are more realistic than the backgrounds, and the backgrounds have greater differences between different geographical regions. 3. **Improvement of prompt strategies**: Proposed a new prompt structure, which can significantly improve the background diversity in the worst - performing regions and improve the quality of generated images. ### Specific problems solved - **Differences between objects and backgrounds**: Objects in generated images are usually more realistic than backgrounds, and the geographical differences of backgrounds are greater. - **Problems in specific regions**: For example, backgrounds generated in Africa often contain stereotypes (such as rural dirt road scenes), it is difficult to generate modern vehicles, or some objects are placed unrealistically in outdoor environments. - **Improvement of diversity**: Through the new prompt structure, the background diversity in the worst - performing regions has been increased by 52% and the average has been increased by 20%. ### Methodology 1. **Segmenting objects and backgrounds**: Use the state - of - the - art segmentation model (SAM) to divide the image into object and background parts. 2. **Feature extraction**: Utilize the Vision Transformer (ViT) to extract the features of objects and backgrounds. 3. **Measuring differences**: Calculate the precision and coverage of objects and backgrounds to evaluate the realism and diversity of generated images. ### Experimental results - **Objects are better than backgrounds**: The precision of objects is higher than that of backgrounds, indicating that the generated objects are closer to real objects. - **Greater geographical differences in backgrounds**: The geographical differences in backgrounds are 1.7 times those of objects. - **Effect of improved prompt strategies**: The new prompt structure significantly improves background diversity, while slightly improving the realism and diversity of objects. ### Conclusion By introducing Decomposed - DIG, researchers have revealed the geographical difference problems in the widely - used Latent Diffusion Model (LDM) when generating images, and proposed preliminary mitigation measures. This provides a basis for more detailed analysis and mitigation strategies in the future, aiming to improve the geographical inclusiveness and accuracy of text - to - image generation models.

Decomposed evaluations of geographic disparities in text-to-image models

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

Inspecting the Geographical Representativeness of Images from Text-to-Image Models

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations

Uncovering Regional Defaults from Photorealistic Forests in Text-to-Image Generation with DALL-E 2

Community-Aware Photo Quality Evaluation by Deeply Encoding Human Perception

Attribute Based Interpretable Evaluation Metrics for Generative Models

Towards Reliable Assessments of Demographic Disparities in Multi-Label Image Classifiers

MIST: Mitigating Intersectional Bias with Disentangled Cross-Attention Editing in Text-to-Image Diffusion Models

Mitigating Social Biases in Text-to-Image Diffusion Models Via Linguistic-Aligned Attention Guidance

Measuring Geographic Performance Disparities of Offensive Language Classifiers

Image Generation Diversity Issues and How to Tame Them

Exploring Social Bias in Downstream Applications of Text-to-Image Foundation Models

TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models

New Job, New Gender? Measuring the Social Bias in Image Generation Models

Consistency-diversity-realism Pareto fronts of conditional image generative models

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation

Stable Bias: Analyzing Societal Representations in Diffusion Models

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

DiffusionPID: Interpreting Diffusion via Partial Information Decomposition