NL-SLAM for OC-VLN: Natural Language Grounded SLAM for Object-Centric VLN

Sonia Raychaudhuri,Duy Ta,Katrina Ashton,Angel X. Chang,Jiuguang Wang,Bernadette Bucher
2024-11-12
Abstract:Landmark-based navigation (e.g. go to the wooden desk) and relative positional navigation (e.g. move 5 meters forward) are distinct navigation challenges solved very differently in existing robotics navigation methodology. We present a new dataset, OC-VLN, in order to distinctly evaluate grounding object-centric natural language navigation instructions in a method for performing landmark-based navigation. We also propose Natural Language grounded SLAM (NL-SLAM), a method to ground natural language instruction to robot observations and poses. We actively perform NL-SLAM in order to follow object-centric natural language navigation instructions. Our methods leverage pre-trained vision and language foundation models and require no task-specific training. We construct two strong baselines from state-of-the-art methods on related tasks, Object Goal Navigation and Vision Language Navigation, and we show that our approach, NL-SLAM, outperforms these baselines across all our metrics of success on OC-VLN. Finally, we successfully demonstrate the effectiveness of NL-SLAM for performing navigation instruction following in the real world on a Boston Dynamics Spot robot.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively combine natural language instructions and environmental observations in robot navigation to achieve object - based goal - directed navigation. Specifically, the paper proposes a new dataset named OC - VLN, which focuses on studying the object - based navigation instruction - following task. In addition, the paper also proposes a new method - Natural Language - supported Simultaneous Localization and Mapping (NL - SLAM). This method can align natural language instructions with the robot's observations and poses, thereby achieving the accurate execution of navigation instructions. Through this method, after receiving natural language instructions, the robot can use the prior information provided by these instructions and combine the observational data in the actual environment to complete the navigation task more accurately. This research aims to bridge the gap between language guidance and spatial perception and improve the autonomous navigation ability of robots in complex environments.