Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

Haonan Guo,Xin Su,Chen Wu,Bo Du,Liangpei Zhang,Deren Li

2024-01-17

Abstract:Recently, the flourishing large language models(LLM), especially ChatGPT, have shown exceptional performance in language understanding, reasoning, and interaction, attracting users and researchers from multiple fields and domains. Although LLMs have shown great capacity to perform human-like task accomplishment in natural language and natural image, their potential in handling remote sensing interpretation tasks has not yet been fully explored. Moreover, the lack of automation in remote sensing task planning hinders the accessibility of remote sensing interpretation techniques, especially to non-remote sensing experts from multiple research fields. To this end, we present Remote Sensing ChatGPT, an LLM-powered agent that utilizes ChatGPT to connect various AI-based remote sensing models to solve complicated interpretation tasks. More specifically, given a user request and a remote sensing image, we utilized ChatGPT to understand user requests, perform task planning according to the tasks' functions, execute each subtask iteratively, and generate the final response according to the output of each subtask. Considering that LLM is trained with natural language and is not capable of directly perceiving visual concepts as contained in remote sensing images, we designed visual cues that inject visual information into ChatGPT. With Remote Sensing ChatGPT, users can simply send a remote sensing image with the corresponding request, and get the interpretation results as well as language feedback from Remote Sensing ChatGPT. Experiments and examples show that Remote Sensing ChatGPT can tackle a wide range of remote sensing tasks and can be extended to more tasks with more sophisticated models such as the remote sensing foundation model. The code and demo of Remote Sensing ChatGPT is publicly available at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper proposes a solution for the complex remote sensing image interpretation task. Although large language models such as ChatGPT excel in natural language understanding and interaction, their potential in the remote sensing domain has not been fully explored. Currently, the automation level of remote sensing task planning is low, limiting non-experts' access to remote sensing interpretation techniques. To address this issue, the paper introduces Remote Sensing ChatGPT, an agent system that combines ChatGPT with various AI remote sensing models. The workflow of Remote Sensing ChatGPT includes prompt template generation, task planning, task execution, and response generation. Firstly, the system generates prompt templates based on user requests, using the BLIP model to provide visual clues for ChatGPT to understand remote sensing images. Then, task planning is carried out to determine the tasks to be executed, such as scene classification, object detection, etc., and task execution is performed based on a task library and examples. During execution, ChatGPT decides which tools to use and generates the final response. Experiments show that Remote Sensing ChatGPT achieves high task planning accuracy on different ChatGPT backends, with a maximum accuracy of 94.9%. However, there are also some failure cases, mainly due to the existing models not supporting certain categories or ChatGPT tending to speculate answers in the absence of information. Future research directions may include developing remote sensing base models with an expanded vocabulary or optimizing the fine-tuning of LLMs to improve performance. This work is an important step towards achieving fully automated remote sensing image interpretation and can benefit researchers in multiple fields to promote the application of remote sensing interpretation techniques.

Remote Sensing ChatGPT: Solving Remote Sensing Tasks with ChatGPT and Visual Models

The Potential of Visual ChatGPT For Remote Sensing

RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques

LHRS-Bot-Nova: Improved Multimodal Large Language Model for Remote Sensing Vision-Language Interpretation

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing

LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

EarthGPT: A Universal Multimodal Large Language Model for Multisensor Image Comprehension in Remote Sensing Domain

KGGPT: Empowering Robots with OpenAI's ChatGPT and Knowledge Graph.

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Vision-Language Models in Remote Sensing: Current progress and future trends

Chat with the Environment: Interactive Multimodal Perception Using Large Language Models

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

WaitGPT: Monitoring and Steering Conversational LLM Agent in Data Analysis with On-the-Fly Code Visualization

Large Language Models for Captioning and Retrieving Remote Sensing Images

RS-GPT4V: A Unified Multimodal Instruction-Following Dataset for Remote Sensing Image Understanding

Machine-to-Machine Visual Dialoguing with ChatGPT for Enriched Textual Image Description

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework