Abstract:Spatial reasoning is a crucial component of both biological and artificial intelligence. In this work, we present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning. To support our study, we created and contribute a novel Spatial Reasoning Characterization (SpaRC) framework and Spatial Reasoning Paths (SpaRP) datasets, to enable an in-depth understanding of the spatial relations and compositions as well as the usefulness of spatial reasoning chains. We found that all the state-of-the-art LLMs do not perform well on the datasets -- their performances are consistently low across different setups. The spatial reasoning capability improves substantially as model sizes scale up. Finetuning both large language models (e.g., Llama-2-70B) and smaller ones (e.g., Llama-2-13B) can significantly improve their F1-scores by 7--32 absolute points. We also found that the top proprietary LLMs still significantly outperform their open-source counterparts in topological spatial understanding and reasoning.

What problem does this paper attempt to address?

This paper mainly discusses the performance of Large Language Models (LLMs) in spatial reasoning ability. The researchers propose a new framework called Spatial Reasoning Characterization (SpaRC) and a dataset called Spatial Reasoning Paths (SpaRP) to deepen understanding of spatial relationships, combinations, and reasoning paths. They find that the current state-of-the-art LLMs perform poorly on these datasets, but as the model size increases, the spatial reasoning ability significantly improves. Fine-tuning large and small LLMs can significantly improve their F1 scores. Additionally, the study points out that proprietary LLMs still outperform open-source models in topological space understanding and reasoning. The main contributions of the paper include: 1. A comprehensive study of the spatial reasoning abilities of LLMs, including different parameter sizes, pre-training and fine-tuning models, and decoding strategies. 2. The proposal of the SpaRC framework, which is a bottom-up approach that focuses on detailed spatial properties to finely control spatial combination rules and contextual settings. 3. The development of SpaRP, which generates step-by-step reasoning steps through a symbolic space reasoner and converts them into textual reasoning paths, demonstrating that fine-tuning LLMs can significantly improve their spatial reasoning performance. The study also compares existing text spatial reasoning tasks such as bAbI, SPART QA, SPARTUN, and StepGame, and points out that LLMs lag behind neural symbol methods in terms of spatial reasoning. The paper provides a comprehensive evaluation of LLMs' spatial reasoning abilities and emphasizes the importance of further extending spatial characteristics in future research.

SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models

Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs

An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models

Exploring and Improving the Spatial Reasoning Abilities of Large Language Models

Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning

Can Large Language Models Reason about the Region Connection Calculus?

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

PLUGH: A Benchmark for Spatial Understanding and Reasoning in Large Language Models

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Inherent limitations of LLMs regarding spatial information

Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models

SAT: Spatial Aptitude Training for Multimodal Language Models

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models