A Quality of Experience and Visual Attention Evaluation for 360° videos with non-spatial and spatial audio

Amit Hirway,Yuansong Qiao,Niall Murray
DOI: https://doi.org/10.1145/3650208
2024-03-06
Abstract:This article presents the results of an empirical study that aimed to investigate the influence of various types of audio (spatial and non-spatial) on the user quality of experience (QoE) of and visual attention in 360° videos. The study compared the head pose, eye gaze, pupil dilations, heart rate and subjective responses of 73 users who watched ten 360° videos with different sound configurations. The configurations evaluated were no sound; non-spatial (stereo) audio; and two spatial sound conditions (first and third-order ambisonics). The videos covered various categories and presented both indoor and outdoor scenarios. The subjective responses were analyzed using an ANOVA (Analysis of Variance) to assess mean differences between sound conditions. Data visualization was also employed to enhance the interpretability of the results. The findings reveal diverse viewing patterns, physiological responses, and subjective experiences among users watching 360° videos with different sound conditions. Spatial audio, in particular third-order ambisonics, garnered heightened attention. This is evident in increased pupil dilation and heart rate. Furthermore, the presence of spatial audio led to more diverse head poses when sound sources were distributed across the scene. These findings have important implications for the development of effective techniques for optimizing processing, encoding, distributing, and rendering content in VR and 360° videos with spatialized audio. These insights are also relevant in the creative realms of content design and enhancement. They provide valuable guidance on how spatial audio influences user attention, physiological responses, and overall subjective experiences. Understanding these dynamics can assist content creators and designers in crafting immersive experiences that leverage spatialized audio to captivate users, enhance engagement, and optimize the overall quality of virtual reality and 360° video content. The dataset, scripts used for data collection, ffmpeg commands used for processing the videos and the subjective questionnaire and its statistical analysis are publicly available.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?
The paper aims to study the impact of different types of audio (non-spatial audio and spatial audio) on user experience (Quality of Experience, QoE) and visual attention in 360-degree videos. Specifically, the paper compares the head posture, eye movements, pupil dilation, heart rate, and subjective feedback of 73 participants while watching 360-degree videos with different sound configurations through experiments. The tested sound configurations include no sound, non-spatial (stereo) audio, and two spatial audio conditions (1st-order and 3rd-order Ambisonics). The study found that spatial audio, particularly 3rd-order Ambisonics, elicited higher attention, as evidenced by increased pupil dilation and accelerated heart rate. Additionally, spatial audio led participants to exhibit more varied head postures when sound sources were distributed within the scene. These findings are significant for optimizing content processing, encoding, distribution, and rendering technologies for virtual reality (VR) and 360-degree videos, and provide valuable guidance for content design and enhancement. The results help content creators and designers leverage spatial audio to enhance user immersion, engagement, and overall experience quality.