Panonut360: A Head and Eye Tracking Dataset for Panoramic Video

Yutong Xu,Junhao Du,Jiahe Wang,Yuwei Ning,Sihan Zhou Yang Cao
DOI: https://doi.org/10.1145/3625468.3652176
2024-03-26
Abstract:With the rapid development and widespread application of VR/AR technology, maximizing the quality of immersive panoramic video services that match users' personal preferences and habits has become a long-standing challenge. Understanding the saliency region where users focus, based on data collected with HMDs, can promote multimedia encoding, transmission, and quality assessment. At the same time, large-scale datasets are essential for researchers and developers to explore short/long-term user behavior patterns and train AI models related to panoramic videos. However, existing panoramic video datasets often include low-frequency user head or eye movement data through short-term videos only, lacking sufficient data for analyzing users' Field of View (FoV) and generating video saliency regions.
Computer Vision and Pattern Recognition,Human-Computer Interaction,Multimedia
What problem does this paper attempt to address?
The paper primarily addresses the following issues: 1. **Lack of Dataset**: Existing panoramic video datasets often only contain low-frequency head or eye movement data from short-term videos, which is insufficient to analyze users' Field of View (FoV) and generate video saliency regions. 2. **Saliency Weight Distribution Issue**: It is usually assumed that users' gaze points decrease from the FoV center following a Gaussian distribution. However, the observed data does not fully conform to this assumption, especially when watching videos for extended periods, where users' gaze points show a certain deviation from the FoV center. To address these issues, the authors constructed a head and eye tracking dataset named "Panonut360," which includes detailed data records of 50 participants (25 males and 25 females) watching 15 panoramic videos. Most of these videos are in 4K resolution, with durations ranging from 140 seconds to 352 seconds, and the data collection frequency is 120Hz. By analyzing this data, the authors found that users' gaze points have a consistent downward deviation relative to the FoV center, rather than a simple Gaussian distribution. Based on this observation, they proposed a new saliency weight distribution model and named it "Panonut," indicating its shape resembles a donut. Additionally, the authors provided scripts for generating saliency distribution maps and a pre-generated set of saliency distribution maps for each video. These resources are very useful for researchers, helping them better understand user behavior and apply it to multimedia encoding, transmission, and quality assessment. The dataset is publicly available, which can promote research and development in the field of panoramic videos.