CityGPT: Empowering Urban Spatial Cognition of Large Language Models

Jie Feng,Yuwei Du,Tianhui Liu,Siqi Guo,Yuming Lin,Yong Li
2024-06-20
Abstract:Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks by building a city-scale world model in the model. First, we construct a diverse instruction tuning dataset CityInstruction for injecting urban knowledge and enhancing spatial reasoning capability effectively. By using a mixture of CityInstruction and general instruction data, we fine-tune various LLMs (e.g., ChatGLM3-6B, Qwen1.5 and LLama3 series) to enhance their capability without sacrificing general abilities. To further validate the effectiveness of proposed methods, we construct a comprehensive benchmark CityEval to evaluate the capability of LLMs on diverse urban scenarios and problems. Extensive evaluation results demonstrate that small LLMs trained with CityInstruction can achieve competitive performance with commercial LLMs in the comprehensive evaluation of CityEval. The source codes are openly accessible to the research community via <a class="link-external link-https" href="https://github.com/tsinghua-fib-lab/CityGPT" rel="external noopener nofollow">this https URL</a>.
Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the limitations of Large Language Models (LLMs) in understanding urban spaces and solving urban-related tasks. Specifically, the authors identify that while LLMs have demonstrated remarkable capabilities in various domains, they struggle with real-life tasks in urban environments due to a lack of detailed geospatial knowledge of the physical world, particularly at the city scale. This limitation is attributed to the absence of relevant training data on the urban environment. To tackle this issue, the authors propose CityGPT, a systematic framework that includes two main components: 1. **CityInstruction**: A diverse instruction tuning dataset designed to inject urban knowledge into general LLMs and enhance their understanding of urban spaces and their ability to solve urban-related tasks. This dataset is constructed through a mobility simulator that mimics human exploration of cities, collecting multi-view data and extending it with explicit spatial reasoning steps. 2. **CityEval**: A comprehensive evaluation benchmark designed to assess the capabilities of LLMs in understanding urban spaces and solving urban tasks. This benchmark evaluates LLMs across various urban scenarios and downstream tasks, including City Image (intuitive understanding of urban elements), Urban Semantics (effects of human activities on urban environments), Spatial Reasoning (high-level spatial cognitive capabilities), and Composite Tasks (integration of multiple capabilities).