ChatNav: Leveraging LLM to Zero-shot Semantic Reasoning in Object Navigation

Yong Zhu,Zhenyu Wen,Xiong Li,Xiufang Shi,Xiang Wu,Hui Dong,Jiming Chen
DOI: https://doi.org/10.1109/tcsvt.2024.3485907
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:In object goal navigation tasks, the robot’s understanding of semantic relationships in the environment is a key factor in its ability to localize target objects. Previously, learning-based methods trained robots using 3D scene datasets to learn semantic relationships. However, these approaches perform poorly in new environments with unfamiliar semantic contexts. In this paper, we propose ChatNat which leverages the powerful knowledge summarizing and reasoning capabilities of a Large Language Model (LLM) for zero-shot inference of explicit semantic relationships. These relationships are further integrated into the navigation system for efficient localization of target objects. ChatNav employs a spatial object clustering algorithm to collect semantic clues and designs common-sense-based prompts for interacting with LLM. It then uses a gravity-repulsion model to convert inference results into heuristic factors for robust navigation decision-making. Our approach requires no additional training and can consistently obtain accurate semantic relationships from LLM, making it well-suited for navigating unknown environments. Experimental results demonstrate the outstanding navigation performance of our proposed method on the Gibson and HM3D datasets, surpassing the current state-of-the-art object goal navigation methods.
What problem does this paper attempt to address?