Abstract:Navigating indoor environments presents significant challenges for visually impaired individuals due to complex layouts and the absence of GPS signals. This paper introduces a novel system that provides turn-by-turn navigation inside buildings using only a smartphone equipped with a camera, leveraging multimodal models, deep learning algorithms, and large language models (LLMs). The smartphone's camera captures real-time images of the surroundings, which are then sent to a nearby Raspberry Pi capable of running on-device LLM models, multimodal models, and deep learning algorithms to detect and recognize architectural features, signage, and obstacles. The interpreted visual data is then translated into natural language instructions by an LLM running on the Raspberry Pi, which is sent back to the user, offering intuitive and context-aware guidance via audio prompts. This solution requires minimal workload on the user's device, preventing it from being overloaded and offering compatibility with all types of devices, including those incapable of running AI models. This approach enables the client to not only run advanced models but also ensure that the training data and other information do not leave the building. Preliminary evaluations demonstrate the system's effectiveness in accurately guiding users through complex indoor spaces, highlighting its potential for widespread application

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to provide reliable indoor navigation solutions for the visually impaired. Specifically, visually impaired people face navigation challenges in indoor environments with complex layouts. Due to the lack of GPS signals and complex indoor layouts, existing navigation technologies are difficult to effectively support them. This research aims to develop a system based on smartphone cameras, deep - learning algorithms, multi - modal models and large language models (LLMs) to provide real - time, step - by - step indoor navigation guidance. ### Main Problems and Challenges 1. **Complex Indoor Environment**: Unlike outdoor navigation, indoor spaces usually have complex layouts and lack unified navigation aids. 2. **Privacy and Data Security**: Some existing solutions rely on cloud services, which may lead to the leakage of sensitive information. 3. **Device Compatibility**: Many advanced AI models require high - performance hardware support, and not all users' devices can meet these requirements. 4. **Energy Consumption**: Running complex AI models may cause excessive energy consumption on users' devices, affecting the user experience. ### Solutions The paper proposes an innovative system that uses smartphone cameras to capture real - time images of the surrounding environment and processes them through nearby Raspberry Pi. The Raspberry Pi runs pre - trained deep - learning models, multi - modal models and large language models (LLMs) to detect and identify architectural features, signs and obstacles. The processed visual data is converted into natural - language instructions and then conveyed to the user through audio prompts, thus providing intuitive and context - aware navigation guidance. ### Key Technologies and Methods - **Deep - learning and Multi - modal Models**: Used to process images and extract key information, such as signs, doors and other important elements. - **Large Language Models (LLMs)**: Convert visual information into easy - to - understand natural - language instructions. - **Edge Computing**: Perform local processing through Raspberry Pi to ensure data privacy and reduce the computational burden on users' devices. ### System Architecture 1. **User Interaction Process**: - The user starts the mobile application and establishes a connection with the nearby Raspberry Pi. - The smartphone camera captures real - time images of the surrounding environment and transmits them to the Raspberry Pi. - The Raspberry Pi analyzes the images and generates natural - language instructions, and then feeds them back to the user through audio. 2. **Raspberry Pi System**: - Equipped with advanced multi - modal and deep - learning models, capable of image recognition and text extraction. - Installed with a local large - language model (LLM), such as Llama, which can convert simple text into detailed descriptive narratives. 3. **Mobile Application**: - As a user interface, establish a connection with the Raspberry Pi system. - Capture and transmit real - time videos or images, and receive detailed turn - by - turn navigation instructions. ### Advantages - **Privacy Protection**: All data processing is completed locally, avoiding the leakage of sensitive information. - **Energy - saving and High - efficiency**: Reduces the energy consumption of users' devices through distributed computing. - **Widely Compatible**: Applicable to various mobile devices, including those without advanced AI chips. ### Conclusions and Future Work This system has successfully demonstrated how to use modern AI technologies to provide reliable indoor navigation assistance for the visually impaired. Future work will include adding other sensors (such as LiDAR or ultrasonic sensors) to improve obstacle detection capabilities, further enhancing the system's dynamic adaptability and security. In addition, expanding multi - language support and customized navigation instructions will also make the system more user - friendly and widely available. Through these improvements, the system is expected to significantly improve the independence and quality of life of the visually impaired, helping them to deal with various challenges in daily life more confidently.

Turn-by-Turn Indoor Navigation for the Visually Impaired

Deep Learning-Based Positioning of Visually Impaired People in Indoor Environments

Visual Localization of Key Positions for Visually Impaired People

Vision-Based Mobile Indoor Assistive Navigation Aid for Blind People

Inavigation: an Image Based Indoor Navigation System

Indoor-Guided Navigation for People Who Are Blind: Crowdsourcing for Route Mapping and Assistance

A Novel Three-Dimensional Navigation Method for the Visually Impaired

NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People

A Lightweight Approach to Localization for Blind and Visually Impaired Travelers

Simple Smartphone-Based Guiding System for Visually Impaired People

Smartphone-Based Indoor Visual Navigation with Leader-Follower Mode

Outdoor Navigation Assistive System Based on Robust and Real-Time Visual-Auditory Substitution Approach

A Vision Aid for the Visually Impaired using Commodity Dual-Rear-Camera Smartphones

Implementation of a Blind navigation method in outdoors/indoors areas

Smartphone Based Indoor Navigation for Blind Persons using User Profile and Simplified Building Information Model

Autonomous Mapping and Navigation using Fiducial Markers and Pan-Tilt Camera for Assisting Indoor Mobility of Blind and Visually Impaired People

Indoor Navigation Assistance System for Visually Impaired with Semantic Segmentation using EdgeTPU

ASSIST: Evaluating the usability and performance of an indoor navigation assistant for blind and visually impaired people

A Deep Learning Based Model to Assist Blind People in Their Navigation

All the Way There and Back: Inertial-Based, Phone-in-Pocket Indoor Wayfinding and Backtracking Apps for Blind Travelers

Corridor-Walker: Mobile Indoor Walking Assistance for Blind People to Avoid Obstacles and Recognize Intersections