Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments

Sangmim Song,Sarath Kodagoda,Amal Gunatilake,Marc G. Carmichael,Karthick Thiyagarajan,Jodi Martin
2024-10-28
Abstract:Navigation presents a significant challenge for persons with visual impairments (PVI). While traditional aids such as white canes and guide dogs are invaluable, they fall short in delivering detailed spatial information and precise guidance to desired locations. Recent developments in large language models (LLMs) and vision-language models (VLMs) offer new avenues for enhancing assistive navigation. In this paper, we introduce Guide-LLM, an embodied LLM-based agent designed to assist PVI in navigating large indoor environments. Our approach features a novel text-based topological map that enables the LLM to plan global paths using a simplified environmental representation, focusing on straight paths and right-angle turns to facilitate navigation. Additionally, we utilize the LLM's commonsense reasoning for hazard detection and personalized path planning based on user preferences. Simulated experiments demonstrate the system's efficacy in guiding PVI, underscoring its potential as a significant advancement in assistive technology. The results highlight Guide-LLM's ability to offer efficient, adaptive, and personalized navigation assistance, pointing to promising advancements in this field.
Robotics,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use large - language models (LLMs) and text topological graphs to assist people with visual impairments (PVI) in navigating large - scale indoor environments. Specifically, although traditional navigation aids such as white canes and guide dogs are very important, they have limitations in providing detailed environmental information and precise guidance. The paper proposes a framework named Guide - LLM, which aims to overcome these challenges in the following ways: 1. **Innovative Navigation Agent Framework**: Introduce an LLM - based navigation agent framework that can understand users' queries and generate appropriate navigation instructions through common - sense reasoning. 2. **Integration of Text Topological Graphs and Image Vector Databases**: Combine text topological graphs and image vector databases, enabling LLMs to perform high - level planning and reducing the dependence on detailed user input. 3. **Simulation Experiment Verification**: Verify the effectiveness of this method through simulation experiments, demonstrating its potential in guiding people with visual impairments. 4. **Personalized Navigation Experience**: Utilize the natural - language - processing capabilities of LLMs to provide personalized navigation services according to users' preferences and needs, such as adjusting walking speed, choosing quiet routes, or avoiding potential danger areas. 5. **Enhanced Safety**: Through the common - sense - reasoning capabilities of LLMs, the system can promptly remind users when potential dangers (such as slippery floors, warning signs, etc.) are detected and provide alternative route suggestions. In summary, the main objective of this paper is to provide more efficient, adaptable, and personalized navigation solutions for people with visual impairments by integrating advanced language models and novel navigation techniques.