Reflex-based open-vocabulary navigation without prior knowledge using omnidirectional camera and multiple vision-language models
Kento Kawaharazuka Yoshiki Obinata Naoaki Kanazawa Naoto Tsukamoto Kei Okada Masayuki Inaba The Department of Mechano-Informatics,Graduate School of Information Science and Technology,The University of Tokyo,Tokyo,JapanKento Kawaharazuka is a project assistant professor at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E.,M.S.,and Ph.D. degrees in Mechano-Informatics from the University of Tokyo in 2017,2019,and 2022,respectively. His research interests include musculoskeletal humanoids,tendon-driven robots,machine learning,and foundation models.Yoshiki Obinata is a Ph.D. course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. and M.S. degrees in Mechano-Informatics from the University of Tokyo in 2021 and 2023,respectively. He was awarded the IEEE Robotics and Automation Society Japan Joint Chapter Young Award and the SICE International Young Authors Award in 2023. His research interests include robot system integration and the development of robot communication systems.Naoaki Kanazawa is a Ph.D. course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. and M.S. degrees in Mechano-Informatics from the University of Tokyo in 2021 and 2023,respectively. His research interests include cooking robot systems.Naoto Tsukamoto is a Master course student at JSK Robotics Laboratory in the Department of Mechano-Informatics at the University of Tokyo. He received his B.E. degree in Mechano-Informatics from the University of Tokyo in 2022. His research interests include task cooperation between humans and robots.Kei Okada received his B.E. in Computer Science from Kyoto University in 1997. He received his M.S. and Ph.D. in Information Engineering from the University of Tokyo in 1999 and 2002,respectively. From 2002 to 2006,he participated in the Professional Program for Strategic Software Project at the University of Tokyo. He was appointed as a lecturer in Creative Informatics at the University of Tokyo in 2006,and became an associate professor and then a professor in the Department of Mechano-Informatics in 2009 and 2018,respectively. His research interests include humanoid robots,real-time 3D computer vision,and recognition-action integrated systems.Masayuki Inaba graduated from the Department of Mechanical Engineering at the University of Tokyo in 1981,and received his M.S. and Ph.D. degrees from the Graduate School of Information Engineering at the University of Tokyo in 1983 and 1986,respectively. He was appointed as a lecturer in the Department of Mechanical Engineering at the University of Tokyo in 1986,an associate professor in 1989,and a professor in the Department of Mechano-Informatics in 2000. His research interests include key technologies of robotic systems and software architectures to advance robotics research.
DOI: https://doi.org/10.1080/01691864.2024.2393409
IF: 2.057
2024-08-22
Advanced Robotics
Abstract:Various robot navigation methods have been developed, but they are mainly based on Simultaneous Localization and Mapping (SLAM), reinforcement learning, etc., which require prior map construction or learning. In this study, we consider the simplest method that does not require any map construction or learning, and execute open-vocabulary navigation of robots without any prior knowledge to do this. We applied an omnidirectional camera and pre-trained vision-language models to the robot. The omnidirectional camera provides a uniform view of the surroundings, thus eliminating the need for complicated exploratory behaviors including trajectory generation. By applying multiple pre-trained vision-language models to this omnidirectional image and incorporating reflective behaviors, we show that navigation becomes simple and does not require any prior setup. Interesting properties and limitations of our method are discussed based on experiments with the mobile robot Fetch.
robotics