Abstract:Audio descriptions (AD) make videos accessible for blind and low vision (BLV) users by describing visual elements that cannot be understood from the main audio track. AD created by professionals or novice describers is time-consuming and lacks scalability while offering little control to BLV viewers on description length and content and when they receive it. To address this gap, we explore user-driven AI-generated descriptions, where the BLV viewer controls when they receive descriptions. In a study, 20 BLV participants activated audio descriptions for seven different video genres with two levels of detail: concise and detailed. Our results show differences in AD frequency and level of detail BLV users wanted for different videos, their sense of control with this style of AD delivery, its limitations, and variations among BLV users in their AD needs and perception of AI-generated descriptions. We discuss the implications of our findings for future AI-based AD tools.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to provide more flexible and personalized audio description (AD) for blind and low - vision (BLV) users, so as to enhance their experience of watching online videos. Specifically, the paper mainly focuses on the following aspects: 1. **Limitations of existing audio descriptions**: - Pre - recorded audio descriptions made by professional or novice describers are time - consuming and lack scalability. - Pre - recorded audio descriptions cannot be adjusted according to users' real - time needs, and users have limited control over the length and content of the descriptions. 2. **Demand for user - driven AI - generated audio descriptions**: - Researchers have explored a user - driven AI - generated audio description method, enabling BLV users to activate descriptions at any time according to their preferences and audio prompts. - This method aims to address the current lack of flexibility and personalization in audio descriptions and provide description services that better meet user needs. 3. **Differences in requirements for different video types**: - Different types of videos (such as movie animations, education, health and fitness, etc.) have different frequencies and levels of detail in their requirements for audio descriptions. - The research has experimentally verified the specific requirements of BLV users for audio descriptions in different video types, as well as their feelings and feedback when using this new method. 4. **User perception and experience**: - The research also explored BLV users' perception of user - driven AI - generated audio descriptions, including their acceptance of this new method, the positive and negative aspects of the use experience, and suggestions for future improvements. ### Research objectives To answer these questions, researchers developed a prototype system that allows BLV users to activate short or detailed audio descriptions by pressing the C or D keys on the keyboard while watching videos. Through experiments with 20 BLV users, the research collected data on the frequency and level of detail of audio description requirements for different video types and analyzed users' views and experiences of the system. ### Main contributions - Provided insights into BLV users' views and experiences of user - driven AI - generated audio descriptions. - Collected quantitative data showing the changes in the frequency and level of detail of audio descriptions required for different video types. - Analyzed users' perception and experience when using user - driven AI - generated audio descriptions, providing a reference for the future design of AI - based audio description platforms. Through these efforts, this research aims to promote the development of audio description technology, making it more in line with the needs of BLV users and thus enhancing their video - watching experience.

Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals

Audio Description Customization

Audio Description Generation in the Era of LLMs and VLMs: A Review of Transferable Generative AI Technologies

Making Accessible Movies Easily: an Intelligent Tool for Authoring and Integrating Audio Descriptions to Movies

NarrationBot and InfoBot: A Hybrid System for Automated Video Description

"It's Kind of Context Dependent": Understanding Blind and Low Vision People's Video Accessibility Preferences Across Viewing Scenarios

SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People

Comparison of fiberendoscopy and suction capsule for small intestinal biopsy in children with and without celiac disease.

AutoAD: Movie Description in Context

You Described, We Archived: A Rich Audio Description Dataset

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description

AutoAD III: The Prequel -- Back to the Pixels

LLM-AD: Large Language Model based Audio Description System

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Contextual AD Narration with Interleaved Multimodal Sequence

Audio describing the mental dimension of narrative characters. Insights from a Flemish case study.

WorldScribe: Towards Context-Aware Live Visual Descriptions

Exploring Community-Driven Descriptions for Making Livestreams Accessible

Audio Description from Image by Modal Translation Network

Caring for Special Participants in the Digital Media Era: A Study on Enhancing the Blind User Experience on Short Video Platforms Through Auditory Cues