Abstract:Estimating salient areas of visual stimuli which are liable to attract viewers’ visual attention is a challenging task because of the high complexity of cognitive behaviors in the brain. Many researchers have been dedicated to this field and obtained many achievements. Some application areas, ranging from computer vision, computer graphics, to multimedia processing, can benefit from saliency detection, considering that the detected saliency has depicted the visual importance of different areas of the visual stimuli. As for the 360 degree visual stimuli, images and videos should record the whole scene in the 3D world, so the resolutions of panoramic images and videos are usually very high. However, when watching 360 degree stimuli, observers can only see part of the scene in the view port, which is presented to the eyes of the observers through the Head Mounted Display (HMD). So sending the whole video, or rendering the whole scene may result in the waste of resources. Thus if we can predict the current field of view, then focuses can be put to the streaming and rendering of the scene in the current field of view. Further more, if we can predict salient areas in the scene, then more fine processing can be done to the visually important areas. The prediction of salient regions for traditional images and videos have been extensively studied. However, conventional saliency prediction methods are not fully adequate for 360 degree contents, because 360 degree stimuli own some unique characteristics. Related study in this area is limited. In this paper, we study the problem of predicting head movement, head–eye motion, and scanpath of viewers when they are watching 360 degree images in the commodity HMDs. Three types of data are specifically analyzed. The first is the head movement data, which can be regarded as the movement of the view port. The second is the head–eye motion data which combines the motion of the head and the movement of the eye within the view port. The third is the scan-paths data of observers in the entire panorama which record the position information as well as the time information. And our model is designed to predict the saliency maps for the first two, and the scanpaths for the last one. Experimental results demonstrate the effectiveness of our model.

360Spred: Saliency Prediction for 360-Degree Videos Based on 3D Separable Graph Convolutional Networks

SalGCN: Saliency Prediction for 360-Degree Images Based on Spherical Graph Convolutional Networks

Saliency Prediction Network for $360^\circ$ Videos

Spherical Convolution-based Saliency Detection for FoV Prediction in 360-degree Video Streaming

Predicting 360° Video Saliency: A ConvLSTM Encoder-Decoder Network with Spatio-temporal Consistency

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Spherical Convolution empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback

SalGFCN: Graph Based Fully Convolutional Network for Panoramic Saliency Prediction

Spherical Convolution empowered FoV Prediction in 360-degree Video Multicast with Limited FoV Feedback

The Prediction of Saliency Map for Head and Eye Movements in 360 Degree Images

Dilated Convolutional Neural Networks for Panoramic Image Saliency Prediction

The prediction of head and eye movement for 360 degree images

Viewing Behavior Supported Visual Saliency Predictor for 360 Degree Videos

MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360 Degree Image Saliency Prediction

Intra- and Inter-Reasoning Graph Convolutional Network for Saliency Prediction on 360° Images

SVGC-AVA: 360-Degree Video Saliency Prediction with Spherical Vector-Based Graph Convolution and Audio-Visual Attention

Spatio-Temporal Video Segmentation of Static Scenes and Its Applications

Viewport-Dependent Saliency Prediction in 360 Video

SalBiNet360: Saliency Prediction on 360° Images with Local-Global Bifurcated Deep Network

Cross-Modality Fusion and Progressive Integration Network for Saliency Prediction on Stereoscopic 3D Images

Video Saliency Prediction Via Spatio-Temporal Reasoning