DiffusionSat: A Generative Foundation Model for Satellite Imagery

Samar Khanna,Patrick Liu,Linqi Zhou,Chenlin Meng,Robin Rombach,Marshall Burke,David Lobell,Stefano Ermon
2024-05-26
Abstract:Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications including environmental monitoring and crop-yield prediction. Satellite images are significantly different from natural images -- they can be multi-spectral, irregularly sampled across time -- and existing diffusion models trained on images from the Web do not support them. Furthermore, remote sensing data is inherently spatio-temporal, requiring conditional generation tasks not supported by traditional methods based on captions or images. In this paper, we present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets. As text-based captions are sparsely available for satellite images, we incorporate the associated metadata such as geolocation as conditioning information. Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting. Our method outperforms previous state-of-the-art methods for satellite image generation and is the first large-scale generative foundation model for satellite imagery. The project website can be found here: <a class="link-external link-https" href="https://samar-khanna.github.io/DiffusionSat/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the inability of existing diffusion models to effectively handle satellite remote sensing data. Specifically: 1. **Differences between satellite images and natural images**: Satellite images are typically multispectral, irregularly sampled, and have spatiotemporal characteristics, whereas existing diffusion models are primarily trained on natural images obtained from the internet and cannot adequately support satellite images. 2. **Lack of textual descriptions**: Satellite images usually lack rich textual descriptions (such as titles or annotations), making traditional text-based conditional generation methods difficult to apply. 3. **Inverse problems**: There are many important inverse problems in satellite image processing, such as super-resolution, cloud removal, and temporal interpolation, which existing diffusion models and methods cannot effectively solve. To fill this gap, the paper proposes **DiffusionSat**, a generative foundation model specifically designed for satellite images. By utilizing metadata of satellite images (such as geographic location, timestamp, ground sampling distance, etc.) as conditional information, DiffusionSat can generate high-quality satellite images and address various generative tasks, including temporal generation, super-resolution, and image restoration.