MVSSC: Meta-reinforcement Learning Based Visual Indoor Navigation Using Multi-View Semantic Spatial Context

Wanruo Zhang,Hong Liu,Jianbing Wu,Yidi Li
DOI: https://doi.org/10.1016/j.patrec.2023.11.023
IF: 4.757
2023-01-01
Pattern Recognition Letters
Abstract:In Visual Indoor Navigation (VIN), Deep Reinforcement Learning (DRL) is commonly used by agents to achieve end-to-end mapping from vision to action when navigating toward a target based on observation. However, current DRL-based work suffers from two challenges: partial observability resulting from using solely a single first-person view and poor generalization in the case of unknown scenes and unknown objects. In light of these issues, this paper introduces the integration of multi-view as an expansion of observability and meta-learning as a primary generalization technique into the DRL framework and presents the meta-reinforcement learning method that leverages Multi-View Semantic Spatial Context (MVSSC). Specifically, aiming to explore the informative multi-view context for better efficiency in searching for and navigating to the target, we model the objects’ relationship from two aspects, multi-view semantic context (MVSEC) and multi-view spatial context (MVSPC). MVSEC enables agents to encode prior semantic relationships adaptively via a multi-view modulated graph. Meanwhile, MVSPC enhances the spatial representation of target-related objects’ correlation through similarity grids of multi-view. After adaptively fusing the multi-view and context information under the meta-reinforcement learning framework, our method can encourage efficient target search and robust navigation with stronger generalization performance to unknown scenes and unknown objects. Extensive experimental results on the AI2-THOR simulator demonstrate that our method outperforms the current state-of-the-art approaches.
What problem does this paper attempt to address?