Abstract:When determining navigation actions, it is important to design effective visual and semantic representations of the observation scenes and robust navigation strategies. The paper proposes a goal-oriented visual semantic navigation method using semantic knowledge graph and transformer. Two kinds of knowledge graphs representing the location relationship between objects are constructed, namely current knowledge graph and prior knowledge graph. The pre-constructed prior knowledge graph is periodically updated by the current knowledge graph obtained in real time, and embedded into the semantic feature vector through graph convolutional network (GCN). The semantic features and extracted scene features are jointly embedded and stored, they are jointly fed into the transformer module to explore the spatio-temporal dependencies between objects in the environment. The navigation strategy is obtained from the Asynchronous Advantage Actor-Critic (A3C) model composed of Long-Short Term Memory (LSTM) and Multi-Layer Perception (MLP). Experiments show that the knowledge graph can significantly improve the navigation performance. More importantly, our experimental results show that our method can improve the generalization ability of navigation across novel scenes and novel objects. Video can be available at https://youtu.be/ZMjNvoK2rbY. Note to Practitioners — The motivation of this work is to develop an efficient visual semantic navigation method. Conventional navigation algorithms lack semantic information and learning ability, and can not adapt to the complex unknown environments. When semantic information is included in navigation, the location relationship between objects can be obtained as a prior knowledge, which can be combined with reinforcement learning to achieve autonomous navigation of agents. In this article, a knowledge graph representing the location relationships between objects has been constructed and regularly updated in real-time. The proposed visual semantic navigation method further improves the generalization ability of navigation. This navigation method can be applied to mobile robots and deployed in many scenarios such as home, restaurant, hospitals, and even factories.

HOGN-TVGN: Human-inspired Embodied Object Goal Navigation Based on Time-varying Knowledge Graph Inference Networks for Robots

ChatNav: Leveraging LLM to Zero-shot Semantic Reasoning in Object Navigation

Goal-Oriented Visual Semantic Navigation Using Semantic Knowledge Graph and Transformer

Knowledge-Enhanced Scene Context Embedding for Object-Oriented Navigation of Autonomous Robots

Aligning Knowledge Graph with Visual Perception for Object-goal Navigation

Remote object navigation for service robots using hierarchical knowledge graph in human-centered environments

Automatic Navigation for Rat-Robots with Modeling of the Human Guidance

Learning to Navigate using Visual Sensor Networks

A Navigation Cognitive System Driven by Hierarchical Spiking Neural Network.

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation

Robotic Navigation Based on Experiences and Predictive Map Inspired by Spatial Cognition

Socially Aware Object Goal Navigation with Heterogeneous Scene Representation Learning

Extracting Dynamic Navigation Goal from Natural Language Dialogue

Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation

An Object-driven Navigation Strategy Based on Active Perception and Semantic Association

Towards Coarse-grained Visual Language Navigation Task Planning Enhanced by Event Knowledge Graph

Object-Based Reliable Visual Navigation for Mobile Robot

Visual Semantic Navigation using Scene Priors

HSPNav: Hierarchical Scene Prior Learning for Visual Semantic Navigation Towards Real Settings

LMD-PGN: Cross-Modal Knowledge Distillation from First-Person-View Images to Third-Person-View BEV Maps for Universal Point Goal Navigation