Think Holistically, Act Down-to-Earth: A Semantic Navigation Strategy with Continuous Environmental Representation and Multi-step Forward Planning

Bolei Chen,Jiaxu Kang,Ping Zhong,Yongzheng Cui,Siyi Lu,Yixiong Liang,Jianxin Wang
DOI: https://doi.org/10.1109/tcsvt.2023.3324380
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The Object goal Navigation (ObjectNav) task requires an agent to navigate through a previously unknown domestic scenario using spatial and semantic contextual information, where the goal is specified by a semantic label (e.g., find a TV). Such a task is especially challenging as it requires formulating and understanding the complex co-occurrence relations among objects in diverse settings, which is critical for long-sequence navigational decision-making. Existing methods learn to either explicitly represent co-occurrence relationships as discrete semantic priors, or implicitly encode them from raw observations, thus can not benefit from the rich environmental semantics. In this work, we propose a novel Deep Reinforcement Learning (DRL) based ObjectNav strategy by actively imagining spatial and semantic clues outside the agent’s Field of View (FoV) and further mining Continuous Environmental Representations (CER) using self-supervised learning. Additionally, the illusion of spatial and semantic patterns allows the agent to perform Multi-Step Forward-Looking Planning (MSFLP) by considering the temporal evolution of egocentric local observations. Our approach is thoroughly evaluated and ablated in the visually realistic environments of the Matterport3D (MP3D) dataset. The experimental results reflect that our method combining CER and imagination-based MSFLP facilitates learning complicated semantic priors and navigation skills, thus achieving state-of-the-art performance on the ObjectNav task. In addition, adequate quantitative and qualitative analyses validate the excellent generalization ability and superiority of our method.
What problem does this paper attempt to address?