Kento Kawaharazuka Yoshiki Obinata Naoaki Kanazawa Naoto Tsukamoto Kei Okada Masayuki Inaba The Department of Mechano-Informatics,Graduate School of Information Science and Technology,The University of Tokyo,Tokyo,JapanKento Kawaharazuka is a project assistant professor at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E.,M.S.,and Ph.D. degrees in Mechano-Informatics from the University of Tokyo in 2017,2019,and 2022,respectively. His research interests include musculoskeletal humanoids,tendon-driven robots,machine learning,and foundation models.Yoshiki Obinata is a Ph.D. course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. and M.S. degrees in Mechano-Informatics from the University of Tokyo in 2021 and 2023,respectively. He was awarded the IEEE Robotics and Automation Society Japan Joint Chapter Young Award and the SICE International Young Authors Award in 2023. His research interests include robot system integration and the development of robot communication systems.Naoaki Kanazawa is a Ph.D. course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. and M.S. degrees in Mechano-Informatics from the University of Tokyo in 2021 and 2023,respectively. His research interests include cooking robot systems.Naoto Tsukamoto is a Master course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. degree in Mechano-Informatics from the University of Tokyo in 2022. His research interests include task cooperation between humans and robots.Kei Okada received his B.E. in Computer Science from Kyoto University in 1997. He received his M.S. and Ph.D. in Information Engineering from the University of Tokyo in 1999 and 2002,respectively. From 2002 to 2006,he participated in the Professional Program for Strategic Software Project at the University of Tokyo. He was appointed as a lecturer in Creative Informatics at the University of Tokyo in 2006,and became an associate professor and then a professor in the Department of Mechano-Informatics in 2009 and 2018,respectively. His research interests include humanoid robots,real-time 3D computer vision,and recognition-action integrated systems.Masayuki Inaba graduated from the Department of Mechanical Engineering at the University of Tokyo in 1981,and received his M.S. and Ph.D. degrees from the Graduate School of Information Engineering at the University of Tokyo in 1983 and 1986,respectively. He was appointed as a lecturer in the Department of Mechanical Engineering at the University of Tokyo in 1986,an associate professor in 1989,and a professor in the Department of Mechano-Informatics in 2000. His research interests include key technologies of robotic systems and software architectures to advance robotics research.

Abstract:Various robot navigation methods have been developed, but they are mainly based on Simultaneous Localization and Mapping (SLAM), reinforcement learning, etc., which require prior map construction or learning. In this study, we consider the simplest method that does not require any map construction or learning, and execute open-vocabulary navigation of robots without any prior knowledge to do this. We applied an omnidirectional camera and pre-trained vision-language models to the robot. The omnidirectional camera provides a uniform view of the surroundings, thus eliminating the need for complicated exploratory behaviors including trajectory generation. By applying multiple pre-trained vision-language models to this omnidirectional image and incorporating reflective behaviors, we show that navigation becomes simple and does not require any prior setup. Interesting properties and limitations of our method are discussed based on experiments with the mobile robot Fetch.

Robotic Applications of Pre-Trained Vision-Language Models to Various Recognition Behaviors

Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization

Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization

Reflex-based open-vocabulary navigation without prior knowledge using omnidirectional camera and multiple vision-language models

Decision-Making in Robotic Grasping with Large Language Models.

Open-World Object Manipulation using Pre-trained Vision-Language Models

Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception

A Survey on Vision-Language-Action Models for Embodied AI

Vision-Language Foundation Models as Effective Robot Imitators

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models

A Survey of Language-Based Communication in Robotics

Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

Multimodal integration learning of robot behavior using deep neural networks

Hey Robot! Personalizing Robot Navigation through Model Predictive Control with a Large Language Model

Emotion recognition models for companion robots

$π_0$: A Vision-Language-Action Flow Model for General Robot Control

Learning Visual-Audio Representations for Voice-Controlled Robots

OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics