Abstract:Navigation presents a significant challenge for persons with visual impairments (PVI). While traditional aids such as white canes and guide dogs are invaluable, they fall short in delivering detailed spatial information and precise guidance to desired locations. Recent developments in large language models (LLMs) and vision-language models (VLMs) offer new avenues for enhancing assistive navigation. In this paper, we introduce Guide-LLM, an embodied LLM-based agent designed to assist PVI in navigating large indoor environments. Our approach features a novel text-based topological map that enables the LLM to plan global paths using a simplified environmental representation, focusing on straight paths and right-angle turns to facilitate navigation. Additionally, we utilize the LLM's commonsense reasoning for hazard detection and personalized path planning based on user preferences. Simulated experiments demonstrate the system's efficacy in guiding PVI, underscoring its potential as a significant advancement in assistive technology. The results highlight Guide-LLM's ability to offer efficient, adaptive, and personalized navigation assistance, pointing to promising advancements in this field.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use large - language models (LLMs) and text topological graphs to assist people with visual impairments (PVI) in navigating large - scale indoor environments. Specifically, although traditional navigation aids such as white canes and guide dogs are very important, they have limitations in providing detailed environmental information and precise guidance. The paper proposes a framework named Guide - LLM, which aims to overcome these challenges in the following ways: 1. **Innovative Navigation Agent Framework**: Introduce an LLM - based navigation agent framework that can understand users' queries and generate appropriate navigation instructions through common - sense reasoning. 2. **Integration of Text Topological Graphs and Image Vector Databases**: Combine text topological graphs and image vector databases, enabling LLMs to perform high - level planning and reducing the dependence on detailed user input. 3. **Simulation Experiment Verification**: Verify the effectiveness of this method through simulation experiments, demonstrating its potential in guiding people with visual impairments. 4. **Personalized Navigation Experience**: Utilize the natural - language - processing capabilities of LLMs to provide personalized navigation services according to users' preferences and needs, such as adjusting walking speed, choosing quiet routes, or avoiding potential danger areas. 5. **Enhanced Safety**: Through the common - sense - reasoning capabilities of LLMs, the system can promptly remind users when potential dangers (such as slippery floors, warning signs, etc.) are detected and provide alternative route suggestions. In summary, the main objective of this paper is to provide more efficient, adaptable, and personalized navigation solutions for people with visual impairments by integrating advanced language models and novel navigation techniques.

Guide-LLM: An Embodied LLM Agent and Text-Based Topological Map for Robotic Guidance of People with Visual Impairments

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

Combination Drug Therapy with Calcium‐Channel Blockers in the Treatment of Systemic Hypertension

Vision-Based Mobile Indoor Assistive Navigation Aid for Blind People

Enhancing the Travel Experience for People with Visual Impairments through Multimodal Interaction: NaviGPT, A Real-Time AI-Driven Mobile Navigation System

Can LLMs Generate Human-Like Wayfinding Instructions? Towards Platform-Agnostic Embodied Instruction Synthesis

Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

Vision and Language Navigation in the Real World via Online Visual Language Mapping

L3MVN: Leveraging Large Language Models for Visual Target Navigation

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Transforming a Quadruped into a Guide Robot for the Visually Impaired: Formalizing Wayfinding, Interaction Modeling, and Safety Mechanism

Navigation Agents for the Visually Impaired: A Sidewalk Simulator and Experiments

BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes

See Spot Guide: Accessible Interfaces for an Assistive Quadruped Robot

The Development of LLMs for Embodied Navigation

VLM-Social-Nav: Socially Aware Robot Navigation through Scoring using Vision-Language Models

Intelligent LiDAR Navigation: Leveraging External Information and Semantic Maps with LLM as Copilot

Functional cooperation between JunD and NF-κB in rat hepatocytes

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Robot Navigation Using Physically Grounded Vision-Language Models in Outdoor Environments

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models