Abstract:Vision-language navigation (VLN) is a critical domain within embedded intelligence, requiring agents to navigate 3D environments based on natural language instructions. Traditional VLN research has focused on improving environmental understanding and decision accuracy. However, these approaches often exhibit a significant performance gap when agents are deployed in novel environments, mainly due to the limited diversity of training data. Expanding datasets to cover a broader range of environments is impractical and costly. We propose the Vision-Language Navigation with Continual Learning (VLNCL) paradigm to address this challenge. In this paradigm, agents incrementally learn new environments while retaining previously acquired knowledge. VLNCL enables agents to maintain an environmental memory and extract relevant knowledge, allowing rapid adaptation to new environments while preserving existing information. We introduce a novel dual-loop scenario replay method (Dual-SR) inspired by brain memory replay mechanisms integrated with VLN agents. This method facilitates consolidating past experiences and enhances generalization across new tasks. By utilizing a multi-scenario memory buffer, the agent efficiently organizes and replays task memories, thereby bolstering its ability to adapt quickly to new environments and mitigating catastrophic forgetting. Our work pioneers continual learning in VLN agents, introducing a novel experimental setup and evaluation metrics. We demonstrate the effectiveness of our approach through extensive evaluations and establish a benchmark for the VLNCL paradigm. Comparative experiments with existing continual learning and VLN methods show significant improvements, achieving state-of-the-art performance in continual learning ability and highlighting the potential of our approach in enabling rapid adaptation while preserving prior knowledge.

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs

InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment

$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation

Narrowing the Gap between Vision and Action in Navigation

TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation

Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments

VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation

UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Vision-Language Navigation with Continual Learning

Continual Vision-and-Language Navigation

Mind the Gap: Improving Success Rate of Vision-and-Language Navigation by Revisiting Oracle Success Routes

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

Vision-Language Navigation Policy Learning and Adaptation

Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning.

Vision-Language Navigation With Self-Supervised Auxiliary Reasoning Tasks

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments