ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processing

Qingming Lin,Rui Hu,Huaxia Li,Sensen Wu,Yadong Li,Kai Fang,Hailin Feng,Zhenhong Du,Liuchang Xu
2024-10-23
Abstract:Vector data is one of the two core data structures in geographic information science (GIS), essential for accurately storing and representing geospatial information. Shapefile, the most widely used vector data format, has become the industry standard supported by all major geographic information systems. However, processing this data typically requires specialized GIS knowledge and skills, creating a barrier for researchers from other fields and impeding interdisciplinary research in spatial data analysis. Moreover, while large language models (LLMs) have made significant advancements in natural language processing and task automation, they still face challenges in handling the complex spatial and topological relationships inherent in GIS vector data. To address these challenges, we propose ShapefileGPT, an innovative framework powered by LLMs, specifically designed to automate Shapefile tasks. ShapefileGPT utilizes a multi-agent architecture, in which the planner agent is responsible for task decomposition and supervision, while the worker agent executes the tasks. We developed a specialized function library for handling Shapefiles and provided comprehensive API documentation, enabling the worker agent to operate Shapefiles efficiently through function calling. For evaluation, we developed a benchmark dataset based on authoritative textbooks, encompassing tasks in categories such as geometric operations and spatial queries. ShapefileGPT achieved a task success rate of 95.24%, outperforming the GPT series models. In comparison to traditional LLMs, ShapefileGPT effectively handles complex vector data analysis tasks, overcoming the limitations of traditional LLMs in spatial analysis. This breakthrough opens new pathways for advancing automation and intelligence in the GIS field, with significant potential in interdisciplinary data analysis and application contexts.
Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve several key challenges faced when dealing with Shapefile data in Geographic Information Systems (GIS): 1. **Barriers of expertise and technology**: - Shapefile is the most widely - used vector data format, but its processing usually requires specialized GIS knowledge and skills. This sets a relatively high technical threshold for researchers and professionals in non - GIS fields to use Shapefile for spatial data analysis. - Non - GIS users face a steep learning curve when using professional GIS software (such as ArcGIS or QGIS), which limits their application in interdisciplinary research. 2. **Limitations of large language models (LLM) in GIS tasks**: - Although LLM has made remarkable progress in natural language processing and task automation, it still faces challenges when dealing with the complex spatial and topological relationships of GIS vector data. - Although the current GPT models can generate code and automate routine tasks, the accuracy and reliability of the generated code are not always guaranteed, especially when dealing with the complex spatial relationships unique to GIS. 3. **The need for automation and intelligence**: - Reduce technical barriers so that non - GIS - professional researchers can manage and analyze spatial data more effectively, and promote interdisciplinary cooperation and innovation. - Improve the automation and intelligence level of GIS operations and promote the development of Geo - spatial Artificial Intelligence (GeoAI). ### Solutions To address the above challenges, the paper proposes ShapefileGPT, a large - language - model framework based on a multi - agent architecture, specifically designed to automate Shapefile tasks. Specifically: - **Multi - agent architecture**: - **Planner Agent**: Responsible for task decomposition and supervision, decomposing user commands into subtasks. - **Worker Agent**: Executes specific subtasks and processes Shapefile - related operations by calling a predefined function library. - **Function library**: - A specialized function library has been developed, providing detailed API documentation, enabling the Worker Agent to operate Shapefile efficiently. - The function library covers basic tabular data operations (such as renaming, filtering, adding fields) and vector data operations (such as spatial join, buffer generation, clipping, geometric transformation). - **Evaluation**: - A benchmark data set has been developed, based on authoritative textbooks, covering task categories such as geometric operations and spatial queries. - The experimental results show that ShapefileGPT has achieved a task success rate of 95.24%, far exceeding the GPT series models. Through these innovations, ShapefileGPT not only improves the automation ability of large - language models in vector data processing, but also lowers the usage threshold for non - GIS - professional users, providing new tools and methods for interdisciplinary spatial data analysis.