Abstract:In virtual reality (VR), correct and precise estimations of user’s visual fixations and head movements can enhance the quality of experience by allocating more computation resources for analysing and rendering on the areas of interest. However, there is insufficient research about understanding the visual exploration of users when modeling VR visual attention. To bridge the gap between the saliency prediction for traditional 2D content and omnidirectional content, we construct the visual attention dataset and propose the visual saliency prediction framework for panoramic videos. Around the instantaneous viewing behavior, we propose a traditional method to adapt 2D saliency models and design a CNN-based model to better predict visual saliency. In the proposed traditional model, mechanism of visual attention and viewing behaviors are considered in the computation of edge weights on graphs which are interpreted as Markov chains. The fraction of the visual attention that is diverted to each high-clarity vision (HCV) area is estimated through equilibrium distribution of this chain. We also propose the Graph-Based CNN model. The RGB channel and optical flow form the spatial-temporal units of HCVs, from which node feature vectors are extracted. Graph convolution is used to learn the mutual information between node feature vectors of HCVs and retain geometric information. Then feature vectors are aligned according to geometry structure of equirectangular format, and the feature decoder maps the aligned feature maps to the data distribution. We also construct the dynamic omnidirectional monocular (DOM) saliency dataset with 64 diverse videos evaluated by 28 people. The subjective results show that the instantaneous viewing behavior is important in the VR experience. Extensive experiments are conducted on the dataset and the results demonstrate the effectiveness of the proposed framework. The dataset will be released to facilitate the future studies related to visual saliency prediction for 360-degree contents.

WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning

PerimetryNet: A Multiscale Fine Grained Deep Network for Three-Dimensional Eye Gaze Estimation Using Visual Field Analysis

Panonut360: A Head and Eye Tracking Dataset for Panoramic Video

Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality.

A Fixation-based 360{\deg} Benchmark Dataset for Salient Object Detection

A Fixation-based 360° Benchmark Dataset for Salient Object Detection

Weakly Supervised Visual-Auditory Fixation Prediction with Multigranularity Perception

PVO: Panoptic Visual Odometry.

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments.

Viewing Behavior Supported Visual Saliency Predictor for 360 Degree Videos

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

An Eye-Tracking Dataset for Visual Attention Modelling in a Virtual Museum Context.

Strategies for enhancing automatic fixation detection in head-mounted eye tracking

Window Detection in Facades Using Heatmap Fusion

Optimizing Fixation Prediction Using Recurrent Neural Networks for 360$^{\circ }$ Video Streaming in Head-Mounted Virtual Reality

Learned Scanpaths Aid Blind Panoramic Video Quality Assessment

Fixation Prediction for 360 ° Video Streaming to Head-Mounted Displays

360° Video Viewing Dataset in Head-Mounted Virtual Reality.

Gaze-in-wild: A dataset for studying eye and head coordination in everyday activities

ASOD60K: An Audio-Induced Salient Object Detection Dataset for Panoramic Videos