Abstract:Estimating salient areas of visual stimuli which are liable to attract viewers’ visual attention is a challenging task because of the high complexity of cognitive behaviors in the brain. Many researchers have been dedicated to this field and obtained many achievements. Some application areas, ranging from computer vision, computer graphics, to multimedia processing, can benefit from saliency detection, considering that the detected saliency has depicted the visual importance of different areas of the visual stimuli. As for the 360 degree visual stimuli, images and videos should record the whole scene in the 3D world, so the resolutions of panoramic images and videos are usually very high. However, when watching 360 degree stimuli, observers can only see part of the scene in the view port, which is presented to the eyes of the observers through the Head Mounted Display (HMD). So sending the whole video, or rendering the whole scene may result in the waste of resources. Thus if we can predict the current field of view, then focuses can be put to the streaming and rendering of the scene in the current field of view. Further more, if we can predict salient areas in the scene, then more fine processing can be done to the visually important areas. The prediction of salient regions for traditional images and videos have been extensively studied. However, conventional saliency prediction methods are not fully adequate for 360 degree contents, because 360 degree stimuli own some unique characteristics. Related study in this area is limited. In this paper, we study the problem of predicting head movement, head–eye motion, and scanpath of viewers when they are watching 360 degree images in the commodity HMDs. Three types of data are specifically analyzed. The first is the head movement data, which can be regarded as the movement of the view port. The second is the head–eye motion data which combines the motion of the head and the movement of the eye within the view port. The third is the scan-paths data of observers in the entire panorama which record the position information as well as the time information. And our model is designed to predict the saliency maps for the first two, and the scanpaths for the last one. Experimental results demonstrate the effectiveness of our model.

3D Pop-Ups: Omnidirectional image visual saliency prediction based on crowdsourced eye-tracking data in VR

Learning Stereoscopic Visual Attention Model for 3d Video

The prediction of head and eye movement for 360 degree images

An Eye-Tracking Dataset for Visual Attention Modelling in a Virtual Museum Context.

SAL3D: a model for saliency prediction in 3D meshes

FixationNet: Forecasting Eye Fixations in Task-Oriented Virtual Environments.

A Computational Model for Stereoscopic Visual Saliency Prediction

SalientVR

SalientVR: Saliency-Driven Mobile 360-Degree Video Streaming with Gaze Information.

Stereoscopic visual saliency prediction based on stereo contrast and stereo focus

Audio-visual Saliency Prediction for Movie Viewing in Immersive Environments: Dataset and Benchmarks

A Learning-Based Visual Saliency Prediction Model for Stereoscopic 3D Video (LBVS-3D)

A saliency dataset for 360-degree videos

SalNet360: Saliency Maps for omni-directional images with CNN

UrbanVR: An immersive analytics system for context-aware urban design

3D Gaze Vis: Sharing Eye Tracking Data Visualization for Collaborative Work in VR Environment

Panonut360: A Head and Eye Tracking Dataset for Panoramic Video

Saliency Prediction on Omnidirectional Image With Generative Adversarial Imitation Learning

A Deep Model of Visual Attention for Saliency Detection on 3D Objects

Learning High-Quality Navigation and Zooming on Omnidirectional Images in Virtual Reality

POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images