Abstract:Regional defaults describe the emerging phenomenon that text-to-image (T2I) foundation models used in generative AI are prone to over-proportionally depicting certain geographic regions to the exclusion of others. In this work, we introduce a scalable evaluation for uncovering such regional defaults. The evaluation consists of region hierarchy--based image generation and cross-level similarity comparisons. We carry out an experiment by prompting DALL-E 2, a state-of-the-art T2I generation model capable of generating photorealistic images, to depict a forest. We select forest as an object class that displays regional variation and can be characterized using spatial statistics. For a region in the hierarchy, our experiment reveals the regional defaults implicit in DALL-E 2, along with their scale-dependent nature and spatial relationships. In addition, we discover that the implicit defaults do not necessarily correspond to the most widely forested regions in reality. Our findings underscore a need for further investigation into the geography of T2I generation and other forms of generative AI.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the phenomenon of **regional defaults in text - to - image (T2I) generation models**. Specifically, researchers are concerned with whether these models tend to over - represent certain geographical regions while ignoring others when generating images. This tendency may lead to the characteristics of certain regions being wrongly regarded as typical representatives of a certain class of objects, thus affecting the fairness and accuracy of the model. To explore this issue, the paper selects "forest" as the research object because the forms of forests in different geographical regions vary significantly and can be quantitatively analyzed through spatial statistical data. The research constructs an experimental framework based on the regional hierarchy, uses DALL·E 2 to generate forest images at different geographical levels, and reveals the regional default phenomenon in the model and its scale - dependence through image similarity comparison. ### Main research methods 1. **Region - based forest image generation**: Using the ISO 3166 list of countries and regions and the United Nations regional codes, a multi - level regional structure from the world to countries or territories is constructed. By inputting the names of different - level regions into DALL·E 2, the corresponding forest images are generated. 2. **Cross - level image similarity comparison**: Calculate the similarity of the generated images between different levels, using two methods: mean - squared error (MSE) and structural similarity index measure (SSIM). MSE is used to measure the average squared difference of pixel values, and SSIM takes into account the similarity of brightness, contrast, and structure. ### Key findings 1. **Low - level regional defaults are inconsistent with high - level ones**: The study found that low - level regional defaults are not always geographically consistent with high - level defaults. For example, according to MSE, Nauru is considered the most similar to the global level, while according to the UN regional level, the Americas are considered the most similar to the global level. 2. **SSIM is superior to MSE**: When a similarity threshold needs to be set, SSIM has more advantages than MSE. The frequency distribution of SSIM indicates that 0.21 can be used as a threshold to distinguish regions with high and low similarity. 3. **Regional defaults do not necessarily correspond to the regions with the highest actual forest coverage**: By comparing the data of FRA 2020, the study found that the regional defaults generated by the model do not always correspond to the regions with the highest actual forest coverage. ### Conclusions and future work The research reveals the existence of regional bias in T2I generation models, and this bias is scale - dependent. Future research can be extended to more object categories and models, and further explore how to deal with the ambiguity of place names in prompts. In addition, establishing a geographical information observatory to more systematically evaluate how generative AI represents geographical information will be an important development direction.

Uncovering Regional Defaults from Photorealistic Forests in Text-to-Image Generation with DALL-E 2

Inspecting the Geographical Representativeness of Images from Text-to-Image Models

Decomposed evaluations of geographic disparities in text-to-image models

Evaluating the Generation of Spatial Relations in Text and Image Generative Models

Creating Image Datasets in Agricultural Environments using DALL.E: Generative AI-Powered Large Language Model

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

DT2I: Dense Text-to-Image Generation from Region Descriptions

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Towards Geographic Inclusion in the Evaluation of Text-to-Image Models

Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models

Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

LSReGen: Large-Scale Regional Generator via Backward Guidance Framework

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation

Formatting the Landscape: Spatial conditional GAN for varying population in satellite imagery

Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Generate Any Scene: Evaluating and Improving Text-to-Vision Generation with Scene Graph Programming

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Spatial-Aware Latent Initialization for Controllable Image Generation