Abstract:Object Goal Navigation (ObjectNav) refers to an agent navigating to an object in an unseen environment, which is an ability often required in the accomplishment of complex tasks. Though it has drawn increasing attention from researchers in the Embodied AI community, there has not been a contemporary and comprehensive survey of ObjectNav. In this survey, we give an overview of this field by summarizing more than 70 recent papers. First, we give the preliminaries of the ObjectNav: the definition, the simulator, and the metrics. Then, we group the existing works into three categories: 1) end-to-end methods that directly map the observations to actions, 2) modular methods that consist of a mapping module, a policy module, and a path planning module, and 3) zero-shot methods that use zero-shot learning to do navigation. Finally, we summarize the performance of existing works and the main failure modes and discuss the challenges of ObjectNav. This survey would provide comprehensive information for researchers in this field to have a better understanding of ObjectNav. Note to Practitioners—This work was motivated by the increased interest in real-world applications of mobile robots. Object Goal Navigation (ObjectNav), which is an important task in these applications, requires an agent to find an object in an unseen environment. To accomplish that, the agent needs to be equipped with the capability to move in the environment, decide where to go, and recognize the object categories. So far, most works on ObjectNav have been done in a simulation environment. We present an overview of the existing works in ObjectNav and introduce them in three categories. Additionally, we analyze the current performance of ObjectNav and the challenges for future research. This paper provides researchers and practitioners with a comprehensive overview of the developed methods in ObjectNav, which can help them to have a good understanding of this task and develop suitable solutions for applications in the real world.

NavTr: Object-Goal Navigation with Learnable Transformer Queries

ChatNav: Leveraging LLM to Zero-shot Semantic Reasoning in Object Navigation

TransNav: spatial sequential transformer network for visual navigation

NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments

Goal-Guided Transformer-Enabled Reinforcement Learning for Efficient Autonomous Navigation

Relation-wise transformer network and reinforcement learning for visual navigation

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

Object Goal Navigation using Goal-Oriented Semantic Exploration

Control Transformer: Robot Navigation in Unknown Environments through PRM-Guided Return-Conditioned Sequence Modeling

A Survey of Object Goal Navigation

Target-Driven Structured Transformer Planner for Vision-Language Navigation

3D-Aware Object Goal Navigation Via Simultaneous Exploration and Identification

Topological Planning with Transformers for Vision-and-Language Navigation

Causality-Aware Transformer Networks for Robotic Navigation

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

VLAI: Exploration and exploitation based on visual-language aligned information for robotic object goal navigation

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation

A transformer-based deep reinforcement learning approach to spatial navigation in a partially observable Morris Water Maze

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation