MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation

Zhiping Yu,Chenyang Liu,Liqin Liu,Zhenwei Shi,Zhengxia Zou
2024-10-15
Abstract:The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of existing generative models in generating large - scale, multi - resolution, and unbounded remote - sensing images. Specifically, although existing generative models can generate high - quality natural - scene images, these images are usually limited to human daily - activity scenes and have limited resolution and information capacity. The paper proposes a new generative foundation model named MetaEarth, aiming to extend the image - generation ability from local human - daily scenes to a global scale. MetaEarth achieves global - scale, multi - resolution, unbounded, and almost infinite remote - sensing image generation by introducing a resolution - guided self - cascading generation framework and a novel noise - sampling strategy. ### Main Challenges and Solutions 1. **Model Capacity**: - **Challenge**: Generating global - scale images requires handling a wide range of geographical features, such as cities, forests, deserts, oceans, glaciers, and snowfields, which demands an extremely high - capacity model. - **Solution**: A world - class generative foundation model with over 600 million parameters was constructed and trained based on the denoising diffusion paradigm. 2. **Controllable - Resolution Image Generation**: - **Challenge**: Different imaging heights correspond to images with different resolutions, resulting in significant differences in the details of geographical features. Currently, there are few studies on controllable - resolution image generation. - **Solution**: A unified model was proposed. At each cascading stage, the previously generated low - resolution image and the geographical resolution are used as controllable generation variables to guide the generation of higher - resolution images. 3. **Unbounded Image Generation**: - **Challenge**: Remote - sensing images are usually extremely large in size, while existing natural - image - generation algorithms can generally only generate images of 512×512 or 1024×1024 pixels. Generating continuous, unbounded large - size images remains an unsolved problem. - **Solution**: A new noise - sampling strategy was designed. By analyzing the generation conditions and the initial noise, the stylistic and semantic consistency between generated image blocks is ensured, thereby enabling the generation of images of arbitrary size. ### Experimental Results The experimental results show that MetaEarth performs excellently in generating high - quality, multi - resolution, and globally - covered images with diverse content. Ablation studies also verify the effectiveness of the above - mentioned method designs. In addition, MetaEarth can serve as a data engine to provide high - quality and diverse training data for downstream tasks. In the experiment, image classification was selected as an example to show that the high - quality generated samples provided by MetaEarth can significantly improve classification accuracy. ### Paper Contributions 1. Proposed MetaEarth, which is the first generative foundation model to extend image generation to a global scale. 2. Proposed a resolution - guided self - cascading generation framework and a noise - sampling strategy, which solve the challenges of generating cross - resolution and continuous unbounded images. 3. As a generative data engine, MetaEarth has the potential to provide virtual environments and real - training - data support for various downstream tasks in the remote - sensing and earth - observation fields, opening up new possibilities for constructing generative world models.