Abstract:The fundamental prerequisite for embodied agents to make intelligent decisions lies in autonomous cognition. Typically, agents optimize decision-making by leveraging extensive spatiotemporal information from episodic memory. Concurrently, they utilize long-term experience for task reasoning and foster conscious behavioral tendencies. However, due to the significant disparities in the heterogeneous modalities of these two cognitive abilities, existing literature falls short in designing effective coupling mechanisms, thus failing to endow robots with comprehensive intelligence. This article introduces a navigation framework, the hierarchical topology-semantic cognitive navigation (HTSCN), which seamlessly integrates both memory and reasoning abilities within a singular end-to-end system. Specifically, we represent memory and reasoning abilities with a topological map and a semantic relation graph, respectively, within a unified dual-layer graph structure. Additionally, we incorporate a neural-based cognition extraction process to capture cross-modal relationships between hierarchical graphs. HTSCN forges a link between two different cognitive modalities, thus further enhancing decision-making performance and the overall level of intelligence. Experimental results demonstrate that in comparison to existing cognitive structures, HTSCN significantly enhances the performance and path efficiency of image-goal navigation. Visualization and interpretability experiments further corroborate the promoting role of memory, reasoning, as well as their online learned relationships, on intelligent behavioral patterns. Furthermore, we deploy HTSCN in real-world scenarios to further verify its feasibility and adaptability.

MemoNav: Working Memory Model for Visual Navigation

MemoNav: Selecting Informative Memories for Visual Navigation

SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning

A Novel Neural Multi-Store Memory Network for Autonomous Visual Navigation in Unknown Environment

Frontier-enhanced Topological Memory with Improved Exploration Awareness for Embodied Visual Navigation

Memory Proxy Maps for Visual Navigation

Vision-Dialog Navigation by Exploring Cross-modal Memory.

Graph Attention Memory for Visual Navigation.

ESceme: Vision-and-Language Navigation with Episodic Scene Memory

Learning multimodal adaptive relation graph and action boost memory for visual navigation

Cognitive Navigation by Neuro-Inspired Localization, Mapping, and Episodic Memory

Cognitive Navigation for Intelligent Mobile Robots: A Learning-Based Approach with Topological Memory Configuration

Structured Scene Memory for Vision-Language Navigation

Visual Navigation Based on Language Assistance and Memory

Toward Learning-Based Visuomotor Navigation with Neural Radiance Fields

MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains

Visuomotor Navigation for Embodied Robots With Spatial Memory and Semantic Reasoning Cognition

A Navigation Cognitive System Driven by Hierarchical Spiking Neural Network.

DGMem: Learning Visual Navigation Policy Without Any Labels by Dynamic Graph Memory

Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation