Uncovering Regional Defaults from Photorealistic Forests in Text-to-Image Generation with DALL-E 2

Zilong Liu,Krzysztof Janowicz,Kitty Currier,Meilin Shi
2024-10-04
Abstract:Regional defaults describe the emerging phenomenon that text-to-image (T2I) foundation models used in generative AI are prone to over-proportionally depicting certain geographic regions to the exclusion of others. In this work, we introduce a scalable evaluation for uncovering such regional defaults. The evaluation consists of region hierarchy--based image generation and cross-level similarity comparisons. We carry out an experiment by prompting DALL-E 2, a state-of-the-art T2I generation model capable of generating photorealistic images, to depict a forest. We select forest as an object class that displays regional variation and can be characterized using spatial statistics. For a region in the hierarchy, our experiment reveals the regional defaults implicit in DALL-E 2, along with their scale-dependent nature and spatial relationships. In addition, we discover that the implicit defaults do not necessarily correspond to the most widely forested regions in reality. Our findings underscore a need for further investigation into the geography of T2I generation and other forms of generative AI.
Computers and Society,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the phenomenon of **regional defaults in text - to - image (T2I) generation models**. Specifically, researchers are concerned with whether these models tend to over - represent certain geographical regions while ignoring others when generating images. This tendency may lead to the characteristics of certain regions being wrongly regarded as typical representatives of a certain class of objects, thus affecting the fairness and accuracy of the model. To explore this issue, the paper selects "forest" as the research object because the forms of forests in different geographical regions vary significantly and can be quantitatively analyzed through spatial statistical data. The research constructs an experimental framework based on the regional hierarchy, uses DALL·E 2 to generate forest images at different geographical levels, and reveals the regional default phenomenon in the model and its scale - dependence through image similarity comparison. ### Main research methods 1. **Region - based forest image generation**: Using the ISO 3166 list of countries and regions and the United Nations regional codes, a multi - level regional structure from the world to countries or territories is constructed. By inputting the names of different - level regions into DALL·E 2, the corresponding forest images are generated. 2. **Cross - level image similarity comparison**: Calculate the similarity of the generated images between different levels, using two methods: mean - squared error (MSE) and structural similarity index measure (SSIM). MSE is used to measure the average squared difference of pixel values, and SSIM takes into account the similarity of brightness, contrast, and structure. ### Key findings 1. **Low - level regional defaults are inconsistent with high - level ones**: The study found that low - level regional defaults are not always geographically consistent with high - level defaults. For example, according to MSE, Nauru is considered the most similar to the global level, while according to the UN regional level, the Americas are considered the most similar to the global level. 2. **SSIM is superior to MSE**: When a similarity threshold needs to be set, SSIM has more advantages than MSE. The frequency distribution of SSIM indicates that 0.21 can be used as a threshold to distinguish regions with high and low similarity. 3. **Regional defaults do not necessarily correspond to the regions with the highest actual forest coverage**: By comparing the data of FRA 2020, the study found that the regional defaults generated by the model do not always correspond to the regions with the highest actual forest coverage. ### Conclusions and future work The research reveals the existence of regional bias in T2I generation models, and this bias is scale - dependent. Future research can be extended to more object categories and models, and further explore how to deal with the ambiguity of place names in prompts. In addition, establishing a geographical information observatory to more systematically evaluate how generative AI represents geographical information will be an important development direction.