HSPNav: Hierarchical Scene Prior Learning for Visual Semantic Navigation Towards Real Settings

Kang Jiaxu,Chen Bolei,Zhong Ping,Yang Haonan,Yu Sheng,Wang Jianxin
DOI: https://doi.org/10.1109/icra57147.2024.10610061
2024-01-01
Abstract:Visual Semantic Navigation (VSN) aims at navigating a robot to a given target object in a previously unseen scene. To tackle this task, the robot must learn a nimble navigation policy by utilizing spatial patterns and semantic co-occurrence relations among objects in the scene. Prevailing approaches extract scene priors from the instant visual observations and solidify them in neural episodic memory to achieve flexible navigation. However, due to the oblivion and underuse of the scene priors, these methods are plagued by repeated exploration, effective-knowledge sparsity, and wrong decisions. To alleviate these issues, we propose a novel VSN policy, HSPNav, based on Hierarchical Scene Priors (HSP) and Deep Reinforcement Learning (DRL). The HSP contains two components, i.e., the egocentric semantic map-based Local Scene Priors (LSP) and the commonsense relational graph-based Global Scene Priors (GSP). Then, efficient semantic navigation is achieved by employing an immediate LSP to retrieve conducive contextual memories from the GSP. By utilizing the MP3D dataset, the experimental results in the Habitat simulator demonstrate that our HSP brings a significant boost over the baselines. Furthermore, we take an essential step from simulation to reality by bridging the gap from Habitat to ROS. The migration evaluations show that HSPNav can generalize to realistic settings well and achieve promising performance.
What problem does this paper attempt to address?