Diffusion-Geo: A Two-Stage Controllable Text-To-Image Generative Model for Remote Sensing Scenarios

He Chen,Liang Chen,Wei Zhang,Miaoxin Cai,Tong Zhang,Can Li,Zhuang Yin
DOI: https://doi.org/10.1109/IGARSS53475.2024.10641523
2024-07-07
Abstract:Image generation is a crucial task to facilitate intelligent interpretation in remote sensing domain. Expanding dataset size through image generation can enhance model performance of downtown task. However, current generative models in remote sensing are mostly unconditional or guided by simple text, resulting in generated images lacking spatial and semantic constraints. This lack of control can negatively optimize downstream task models. To tackle these challenges, a two-stage controllable text-image generative model called Diffusion-Geo is presented. In the first stage, an extensive image-text generation dataset called RS-Control is created through prompt engineering of multimodal large language models (MLLMs) and manual prompts for existing datasets, incorporates diverse conditional controls with rich spatial and semantic information. Then RS-Control dataset is utilized to train a universal controllable image generative model. The second stage involves efficient tuning the universal model for different task datasets, minimizing fine-tuning costs while preserving diversity and high-quality features. Experiments conducted on the RSICD caption dataset and WHU change detection dataset demonstrate the superiority of Diffusion-Geo over other state-of-the-art models in image generation.
Environmental Science,Computer Science
What problem does this paper attempt to address?