Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals

Maryam Cheema,Hasti Seifi,Pooyan Fazli
2024-11-19
Abstract:Audio descriptions (AD) make videos accessible for blind and low vision (BLV) users by describing visual elements that cannot be understood from the main audio track. AD created by professionals or novice describers is time-consuming and lacks scalability while offering little control to BLV viewers on description length and content and when they receive it. To address this gap, we explore user-driven AI-generated descriptions, where the BLV viewer controls when they receive descriptions. In a study, 20 BLV participants activated audio descriptions for seven different video genres with two levels of detail: concise and detailed. Our results show differences in AD frequency and level of detail BLV users wanted for different videos, their sense of control with this style of AD delivery, its limitations, and variations among BLV users in their AD needs and perception of AI-generated descriptions. We discuss the implications of our findings for future AI-based AD tools.
Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to provide more flexible and personalized audio description (AD) for blind and low - vision (BLV) users, so as to enhance their experience of watching online videos. Specifically, the paper mainly focuses on the following aspects: 1. **Limitations of existing audio descriptions**: - Pre - recorded audio descriptions made by professional or novice describers are time - consuming and lack scalability. - Pre - recorded audio descriptions cannot be adjusted according to users' real - time needs, and users have limited control over the length and content of the descriptions. 2. **Demand for user - driven AI - generated audio descriptions**: - Researchers have explored a user - driven AI - generated audio description method, enabling BLV users to activate descriptions at any time according to their preferences and audio prompts. - This method aims to address the current lack of flexibility and personalization in audio descriptions and provide description services that better meet user needs. 3. **Differences in requirements for different video types**: - Different types of videos (such as movie animations, education, health and fitness, etc.) have different frequencies and levels of detail in their requirements for audio descriptions. - The research has experimentally verified the specific requirements of BLV users for audio descriptions in different video types, as well as their feelings and feedback when using this new method. 4. **User perception and experience**: - The research also explored BLV users' perception of user - driven AI - generated audio descriptions, including their acceptance of this new method, the positive and negative aspects of the use experience, and suggestions for future improvements. ### Research objectives To answer these questions, researchers developed a prototype system that allows BLV users to activate short or detailed audio descriptions by pressing the C or D keys on the keyboard while watching videos. Through experiments with 20 BLV users, the research collected data on the frequency and level of detail of audio description requirements for different video types and analyzed users' views and experiences of the system. ### Main contributions - Provided insights into BLV users' views and experiences of user - driven AI - generated audio descriptions. - Collected quantitative data showing the changes in the frequency and level of detail of audio descriptions required for different video types. - Analyzed users' perception and experience when using user - driven AI - generated audio descriptions, providing a reference for the future design of AI - based audio description platforms. Through these efforts, this research aims to promote the development of audio description technology, making it more in line with the needs of BLV users and thus enhancing their video - watching experience.