NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People

Jun Yu,Yifan Zhang,Badrinadh Aila,Vinod Namboodiri
2024-10-09
Abstract:Indoor navigation is challenging due to the absence of satellite positioning. This challenge is manifold greater for Visually Impaired People (VIPs) who lack the ability to get information from wayfinding signage. Other sensor signals (e.g., Bluetooth and LiDAR) can be used to create turn-by-turn navigation solutions with position updates for users. Unfortunately, these solutions require tags to be installed all around the environment or the use of fairly expensive hardware. Moreover, these solutions require a high degree of manual involvement that raises costs, thus hampering scalability. We propose an image dataset and associated image-centric solution called NaVIP towards visual intelligence that is infrastructure-free and task-scalable, and can assist VIPs in understanding their surroundings. Specifically, we start by curating large-scale phone camera data in a four-floor research building, with 300K images, to lay the foundation for creating an image-centric indoor navigation and exploration solution for inclusiveness. Every image is labelled with precise 6DoF camera poses, details of indoor PoIs, and descriptive captions to assist VIPs. We benchmark on two main aspects: 1) positioning system and 2) exploration support, prioritizing training scalability and real-time inference, to validate the prospect of image-based solution towards indoor navigation. The dataset, code, and model checkpoints are made publicly available at <a class="link-external link-https" href="https://github.com/junfish/VIP_Navi" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve **the challenges of independent navigation of visually - impaired persons (VIPs) in indoor environments**. Specifically, the author proposes an image - based indoor navigation solution (NaVIP) to help visually - impaired people understand their surroundings and navigate. #### Main problems: 1. **Lack of effective indoor positioning technologies**: Since satellite positioning systems (such as GPS) cannot be used indoors, existing technologies such as Wi - Fi, Bluetooth, and LiDAR can provide positioning services to a certain extent, but they require the installation of additional hardware devices, and the maintenance cost is high, making it difficult to promote on a large scale. 2. **Special challenges faced by visually - impaired people**: Visually - impaired people cannot obtain sign information through vision and it is difficult for them to create mental maps. Therefore, it is very difficult for them to navigate independently in unfamiliar indoor environments. This is not only a matter of convenience, but may also limit their access to unknown indoor spaces. 3. **Limitations of existing technologies**: Although existing indoor navigation technologies can meet the needs of ordinary people to a certain extent, in dynamically changing environments, especially when real - time precise positioning and understanding of the surrounding environment are required, these technologies perform poorly and are difficult to meet the needs of visually - impaired people. #### Solutions: - **Image - based indoor navigation system (NaVIP)**: This system uses image data collected by smartphone cameras and realizes precise indoor positioning and environmental description through a deep - learning model. Compared with methods relying on external sensors or hardware devices, NaVIP is an infrastructure - free solution with higher scalability and adaptability. - **Construction of a large - scale image data set**: In order to train and validate this system, the author constructed a large - scale data set containing approximately 300,000 images. Each image is labeled with an accurate 6 - degree - of - freedom (6DoF) camera pose, indoor points of interest (PoIs), and descriptive text that helps visually - impaired people understand the environment. - **Real - time inference and task scalability**: Through end - to - end training and inference, this system can respond to query images within a few milliseconds and provide real - time navigation support. At the same time, the system's architecture design enables it to adapt to scenes of different sizes and complexities. Through these methods, NaVIP aims to provide visually - impaired people with a low - cost, efficient, and easy - to - use indoor navigation tool to help them move more confidently in complex indoor environments.