Abstract:This paper investigates the zero-shot object goal visual navigation problem. In the object goal visual navigation task, the agent needs to locate navigation targets from its egocentric visual input. "Zero-shot" means that the target the agent needs to find is not trained during the training phase. To address the issue of coupling navigation ability with target features during training, we propose the Class-Independent Relationship Network (CIRN). This method combines target detection information with the relative semantic similarity between the target and the navigation target, and constructs a brand new state representation based on similarity ranking, this state representation does not include target feature or environment feature, effectively decoupling the agent's navigation ability from target features. And a Graph Convolutional Network (GCN) is employed to learn the relationships between different objects based on their similarities. During testing, our approach demonstrates strong generalization capabilities, including zero-shot navigation tasks with different targets and environments. Through extensive experiments in the AI2-THOR virtual environment, our method outperforms the current state-of-the-art approaches in the zero-shot object goal visual navigation task. Furthermore, we conducted experiments in more challenging cross-target and cross-scene settings, which further validate the robustness and generalization ability of our method. Our code is available at: <a class="link-external link-https" href="https://github.com/SmartAndCleverRobot/ICRA-CIRN" rel="external noopener nofollow">this https URL</a>.

Target-Driven Visual Navigation by Using Causal Intervention

Vision-and-Language Navigation via Causal Learning

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

A target-driven visual navigation method based on intrinsic motivation exploration and space topological cognition

Target-driven Indoor Visual Navigation Using Inverse Reinforcement Learning

Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation

TDNCIB: Target-Driven Navigation with Convolutional Information Branch Based on Deep Reinforcement Learning

Causality-Aware Transformer Networks for Robotic Navigation

Visual Navigation with Multiple Goals Based on Deep Reinforcement Learning

Target-based Visual Navigation with Channel-aware Network

An Interactive Navigation Method with Effect-oriented Affordance

Zero-Shot Object Goal Visual Navigation With Class-Independent Relationship Network

Towards Target-Driven Visual Navigation in Indoor Scenes via Generative Imitation Learning

TransNav: spatial sequential transformer network for visual navigation

Target-driven Visual Navigation in Indoor Scenes Using Reinforcement Learning and Imitation Learning

Visual Hindsight Self-Imitation Learning for Interactive Navigation

C^2INet: Realizing Incremental Trajectory Prediction with Prior-Aware Continual Causal Intervention

Reinforcement Learning-Based Visual Navigation With Information-Theoretic Regularization

Searching from Superior to Inferior: A Hierarchical Relationship for Visual Navigation

Exploring the Task Cooperation in Multi-goal Visual Navigation.

Contrastive Instruction-Trajectory Learning for Vision-Language Navigation