EarthGen: Generating the World from Top-Down Views

Ansh Sharma,Albert Xiao,Praneet Rathi,Rohit Kundu,Albert Zhai,Yuan Shen,Shenlong Wang
2024-09-08
Abstract:In this work, we present a novel method for extensive multi-scale generative terrain modeling. At the core of our model is a cascade of superresolution diffusion models that can be combined to produce consistent images across multiple resolutions. Pairing this concept with a tiled generation method yields a scalable system that can generate thousands of square kilometers of realistic Earth surfaces at high resolution. We evaluate our method on a dataset collected from Bing Maps and show that it outperforms super-resolution baselines on the extreme super-resolution task of 1024x zoom. We also demonstrate its ability to create diverse and coherent scenes via an interactive gigapixel-scale generated map. Finally, we demonstrate how our system can be extended to enable novel content creation applications including controllable world generation and 3D scene generation.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to address the challenges of generating large - scale Earth landscape images. Specifically, the authors propose a new framework named EarthGen for generating Earth - observation images of infinite size, high - resolution, and photorealistic quality. The following are the key problems that the paper attempts to solve: 1. **Global Consistency and Detail Diversity**: - Earth landscapes are highly structured and diverse at the macro level, ranging from vast plains to dense urban areas. - At the same time, they are full of rich details at the micro level. - Existing methods either have difficulty in capturing the macro - structure (such as compositional methods) or lack sufficient details when generating large - scale images (such as hierarchical methods). 2. **Extreme Super - Resolution Generation**: - Existing techniques perform poorly in handling extreme super - resolution tasks (such as 1024 - fold magnification), especially in maintaining image authenticity and details. - EarthGen solves this problem through the cascade of multi - scale diffusion models, which can gradually add reasonable details at different scales. 3. **Infinite Scalability**: - Generating large - scale visual content requires ensuring that the generated images maintain consistency and high - quality within an infinite range. - EarthGen combines the advantages of hierarchical generation and compositional methods to achieve infinite - scale image generation while maintaining global consistency and local details. 4. **Controllable Generation and 3D Scene Generation**: - The paper also demonstrates the potential of EarthGen in controllable generation (based on map layout) and 3D scene generation, which provides new possibilities for future content creation and applications. ### Main Contributions - Propose a new type of continuous generation framework that can create photorealistic and arbitrarily - sized visual images, spanning up to 5 levels with a resolution difference of up to 1024 times. - Develop the EarthGen system, which can generate high - quality large - scale Earth - observation images, applicable to multiple fields such as computer vision, remote sensing, environmental science, agriculture, and urban planning. - Significantly outperform existing methods in extreme super - resolution tasks (1024 - fold magnification), with experimental results showing excellent performance in both quantitative and qualitative evaluations. - Demonstrate the application potential of EarthGen in controllable generation and 3D world creation. Through these contributions, the paper provides a powerful tool for large - scale Earth landscape image generation, helps to address environmental and social challenges, and provides new opportunities for research and applications in multiple fields.