Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera

Inpyo Song,Minjun Joo,Joonhyung Kwon,Jangwon Lee
2024-05-30
Abstract:This paper addresses the daily challenges encountered by visually impaired individuals, such as limited access to information, navigation difficulties, and barriers to social interaction. To alleviate these challenges, we introduce a novel visual question answering dataset. Our dataset offers two significant advancements over previous datasets: Firstly, it features videos captured using a 360-degree egocentric wearable camera, enabling observation of the entire surroundings, departing from the static image-centric nature of prior datasets. Secondly, unlike datasets centered on singular challenges, ours addresses multiple real-life obstacles simultaneously through an innovative visual-question answering framework. We validate our dataset using various state-of-the-art VideoQA methods and diverse metrics. Results indicate that while progress has been made, satisfactory performance levels for AI-powered assistive services remain elusive for visually impaired individuals. Additionally, our evaluation highlights the distinctive features of the proposed dataset, featuring ego-motion in videos captured via 360-degree cameras across varied scenarios.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily addresses the challenges faced by Visually Impaired Persons (VIPs) in their daily lives, such as limited access to information, navigation difficulties, and social interaction barriers. It proposes a new video question-answering dataset called VIEW-QA (Visually Impaired Equipped with Wearable 360-degree camera Question Answering). The paper aims to solve the following core issues by constructing this dataset: 1. **Improving the quality of life for visually impaired persons**: By developing a dataset based on a 360-degree panoramic wearable camera, it helps visually impaired individuals better understand their surroundings, thereby enhancing their quality of life and independence. 2. **Covering various daily challenges**: Unlike previous datasets that focus on a single task, the VIEW-QA dataset is designed to simultaneously address multiple real-world challenges faced by visually impaired persons, including social interaction, environmental perception, object recognition, navigation, and safety issues. 3. **Utilizing dynamic visual input**: Compared to existing datasets that rely on static images, VIEW-QA uses video format, which can capture more dynamic and complex scene changes, making it more aligned with the actual needs of visually impaired persons. 4. **Promoting the development of AI-assisted technologies**: By introducing a dataset that includes multi-faceted questions and answer annotations, it provides resources for developing AI systems capable of effectively interpreting complex visual scenes and providing timely relevant information to visually impaired persons. In summary, this research aims to advance AI-assisted technologies by constructing the VIEW-QA dataset, thereby better supporting visually impaired persons in overcoming various challenges in their daily lives.