Kento Kawaharazuka Yoshiki Obinata Naoaki Kanazawa Naoto Tsukamoto Kei Okada Masayuki Inaba The Department of Mechano-Informatics,Graduate School of Information Science and Technology,The University of Tokyo,Tokyo,JapanKento Kawaharazuka is a project assistant professor at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E.,M.S.,and Ph.D. degrees in Mechano-Informatics from the University of Tokyo in 2017,2019,and 2022,respectively. His research interests include musculoskeletal humanoids,tendon-driven robots,machine learning,and foundation models.Yoshiki Obinata is a Ph.D. course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. and M.S. degrees in Mechano-Informatics from the University of Tokyo in 2021 and 2023,respectively. He was awarded the IEEE Robotics and Automation Society Japan Joint Chapter Young Award and the SICE International Young Authors Award in 2023. His research interests include robot system integration and the development of robot communication systems.Naoaki Kanazawa is a Ph.D. course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. and M.S. degrees in Mechano-Informatics from the University of Tokyo in 2021 and 2023,respectively. His research interests include cooking robot systems.Naoto Tsukamoto is a Master course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. degree in Mechano-Informatics from the University of Tokyo in 2022. His research interests include task cooperation between humans and robots.Kei Okada received his B.E. in Computer Science from Kyoto University in 1997. He received his M.S. and Ph.D. in Information Engineering from the University of Tokyo in 1999 and 2002,respectively. From 2002 to 2006,he participated in the Professional Program for Strategic Software Project at the University of Tokyo. He was appointed as a lecturer in Creative Informatics at the University of Tokyo in 2006,and became an associate professor and then a professor in the Department of Mechano-Informatics in 2009 and 2018,respectively. His research interests include humanoid robots,real-time 3D computer vision,and recognition-action integrated systems.Masayuki Inaba graduated from the Department of Mechanical Engineering at the University of Tokyo in 1981,and received his M.S. and Ph.D. degrees from the Graduate School of Information Engineering at the University of Tokyo in 1983 and 1986,respectively. He was appointed as a lecturer in the Department of Mechanical Engineering at the University of Tokyo in 1986,an associate professor in 1989,and a professor in the Department of Mechano-Informatics in 2000. His research interests include key technologies of robotic systems and software architectures to advance robotics research.

Abstract:Various robot navigation methods have been developed, but they are mainly based on Simultaneous Localization and Mapping (SLAM), reinforcement learning, etc., which require prior map construction or learning. In this study, we consider the simplest method that does not require any map construction or learning, and execute open-vocabulary navigation of robots without any prior knowledge to do this. We applied an omnidirectional camera and pre-trained vision-language models to the robot. The omnidirectional camera provides a uniform view of the surroundings, thus eliminating the need for complicated exploratory behaviors including trajectory generation. By applying multiple pre-trained vision-language models to this omnidirectional image and incorporating reflective behaviors, we show that navigation becomes simple and does not require any prior setup. Interesting properties and limitations of our method are discussed based on experiments with the mobile robot Fetch.

Visual Navigation Based on Semantic Segmentation Using Only a Monocular Camera as an External Sensor

Visual Navigation Based on Semantic Segmentation Using Only a Monocular Camera as an External Sensor

A Computer Vision for Navigation of Mobile Robots

Visual Navigation Using a Webcam Based on Semantic Segmentation for Indoor Robots

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

Monocular Vision Based Obstacle Detection for Robot Navigation in Unstructured Environment

Semantic Visual Odometry Based on Panoramic Annular Imaging

Accuracy Improvement of Semantic Segmentation Using Appropriate Datasets for Robot Navigation

A Hybrid Approach to Real-Time Robotic Visual Navigation: Integrating Detection and Scene Segmentation

Autonomous social robot navigation in unknown urban environments using semantic segmentation

See What the Robot Can't See: Learning Cooperative Perception for Visual Navigation

Probabilistic Visual Navigation with Bidirectional Image Prediction

Multi-Scale Fully Convolutional Network-Based Semantic Segmentation for Mobile Robot Navigation

Visual Representations for Semantic Target Driven Navigation

Monocular Vision Navigation and Control of Mobile Robot

Sistema de Navegação Autônomo Baseado em Visão Computacional

Online Robot Navigation and Manipulation with Distilled Vision-Language Models

Reflex-based open-vocabulary navigation without prior knowledge using omnidirectional camera and multiple vision-language models

Real-time 3D Semantic Scene Perception for Egocentric Robots with Binocular Vision