Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

Haonan Guo,Xin Su,Chen Wu,Bo Du,Liangpei Zhang,Deren Li
2024-01-17
Abstract:Recently, the flourishing large language models(LLM), especially ChatGPT, have shown exceptional performance in language understanding, reasoning, and interaction, attracting users and researchers from multiple fields and domains. Although LLMs have shown great capacity to perform human-like task accomplishment in natural language and natural image, their potential in handling remote sensing interpretation tasks has not yet been fully explored. Moreover, the lack of automation in remote sensing task planning hinders the accessibility of remote sensing interpretation techniques, especially to non-remote sensing experts from multiple research fields. To this end, we present Remote Sensing ChatGPT, an LLM-powered agent that utilizes ChatGPT to connect various AI-based remote sensing models to solve complicated interpretation tasks. More specifically, given a user request and a remote sensing image, we utilized ChatGPT to understand user requests, perform task planning according to the tasks' functions, execute each subtask iteratively, and generate the final response according to the output of each subtask. Considering that LLM is trained with natural language and is not capable of directly perceiving visual concepts as contained in remote sensing images, we designed visual cues that inject visual information into ChatGPT. With Remote Sensing ChatGPT, users can simply send a remote sensing image with the corresponding request, and get the interpretation results as well as language feedback from Remote Sensing ChatGPT. Experiments and examples show that Remote Sensing ChatGPT can tackle a wide range of remote sensing tasks and can be extended to more tasks with more sophisticated models such as the remote sensing foundation model. The code and demo of Remote Sensing ChatGPT is publicly available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper proposes a solution for the complex remote sensing image interpretation task. Although large language models such as ChatGPT excel in natural language understanding and interaction, their potential in the remote sensing domain has not been fully explored. Currently, the automation level of remote sensing task planning is low, limiting non-experts' access to remote sensing interpretation techniques. To address this issue, the paper introduces Remote Sensing ChatGPT, an agent system that combines ChatGPT with various AI remote sensing models. The workflow of Remote Sensing ChatGPT includes prompt template generation, task planning, task execution, and response generation. Firstly, the system generates prompt templates based on user requests, using the BLIP model to provide visual clues for ChatGPT to understand remote sensing images. Then, task planning is carried out to determine the tasks to be executed, such as scene classification, object detection, etc., and task execution is performed based on a task library and examples. During execution, ChatGPT decides which tools to use and generates the final response. Experiments show that Remote Sensing ChatGPT achieves high task planning accuracy on different ChatGPT backends, with a maximum accuracy of 94.9%. However, there are also some failure cases, mainly due to the existing models not supporting certain categories or ChatGPT tending to speculate answers in the absence of information. Future research directions may include developing remote sensing base models with an expanded vocabulary or optimizing the fine-tuning of LLMs to improve performance. This work is an important step towards achieving fully automated remote sensing image interpretation and can benefit researchers in multiple fields to promote the application of remote sensing interpretation techniques.