Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People

Zain Merchant,Abrar Anwar,Emily Wang,Souti Chattopadhyay,Jesse Thomason
2024-07-11
Abstract:Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instances. Through a sighted user study, we demonstrate that large pretrained language models can produce correct and useful instructions perceived as beneficial for BLV users. We also conduct a survey and interview with 4 BLV users and observe useful insights on preferences for different instructions based on the scenario.
Computation and Language,Human-Computer Interaction
What problem does this paper attempt to address?
This paper aims to address the challenges faced by blind and low vision (BLV) individuals when navigating unfamiliar environments. Specifically, the paper constructs a dataset of images and targets across different scenarios (such as kitchen search or outdoor navigation) and investigates how to use vision-based language models to generate contextually relevant navigation instructions. Through user studies with sighted participants, the paper demonstrates that large pre-trained language models can generate useful and accurate navigation instructions for BLV users. Additionally, the authors conducted surveys and interviews with 4 BLV users to gain insights into their preferences for instructions in different scenarios. Overall, the goal of the paper is to evaluate and improve the effectiveness and relevance of navigation guidance methods based on large language models (LLMs) and vision-language models (VLMs).