SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models

Md Imbesat Hassan Rizvi,Xiaodan Zhu,Iryna Gurevych
2024-06-07
Abstract:Spatial reasoning is a crucial component of both biological and artificial intelligence. In this work, we present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning. To support our study, we created and contribute a novel Spatial Reasoning Characterization (SpaRC) framework and Spatial Reasoning Paths (SpaRP) datasets, to enable an in-depth understanding of the spatial relations and compositions as well as the usefulness of spatial reasoning chains. We found that all the state-of-the-art LLMs do not perform well on the datasets -- their performances are consistently low across different setups. The spatial reasoning capability improves substantially as model sizes scale up. Finetuning both large language models (e.g., Llama-2-70B) and smaller ones (e.g., Llama-2-13B) can significantly improve their F1-scores by 7--32 absolute points. We also found that the top proprietary LLMs still significantly outperform their open-source counterparts in topological spatial understanding and reasoning.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper mainly discusses the performance of Large Language Models (LLMs) in spatial reasoning ability. The researchers propose a new framework called Spatial Reasoning Characterization (SpaRC) and a dataset called Spatial Reasoning Paths (SpaRP) to deepen understanding of spatial relationships, combinations, and reasoning paths. They find that the current state-of-the-art LLMs perform poorly on these datasets, but as the model size increases, the spatial reasoning ability significantly improves. Fine-tuning large and small LLMs can significantly improve their F1 scores. Additionally, the study points out that proprietary LLMs still outperform open-source models in topological space understanding and reasoning. The main contributions of the paper include: 1. A comprehensive study of the spatial reasoning abilities of LLMs, including different parameter sizes, pre-training and fine-tuning models, and decoding strategies. 2. The proposal of the SpaRC framework, which is a bottom-up approach that focuses on detailed spatial properties to finely control spatial combination rules and contextual settings. 3. The development of SpaRP, which generates step-by-step reasoning steps through a symbolic space reasoner and converts them into textual reasoning paths, demonstrating that fine-tuning LLMs can significantly improve their spatial reasoning performance. The study also compares existing text spatial reasoning tasks such as bAbI, SPART QA, SPARTUN, and StepGame, and points out that LLMs lag behind neural symbol methods in terms of spatial reasoning. The paper provides a comprehensive evaluation of LLMs' spatial reasoning abilities and emphasizes the importance of further extending spatial characteristics in future research.