Abstract:Estimating salient areas of visual stimuli which are liable to attract viewers’ visual attention is a challenging task because of the high complexity of cognitive behaviors in the brain. Many researchers have been dedicated to this field and obtained many achievements. Some application areas, ranging from computer vision, computer graphics, to multimedia processing, can benefit from saliency detection, considering that the detected saliency has depicted the visual importance of different areas of the visual stimuli. As for the 360 degree visual stimuli, images and videos should record the whole scene in the 3D world, so the resolutions of panoramic images and videos are usually very high. However, when watching 360 degree stimuli, observers can only see part of the scene in the view port, which is presented to the eyes of the observers through the Head Mounted Display (HMD). So sending the whole video, or rendering the whole scene may result in the waste of resources. Thus if we can predict the current field of view, then focuses can be put to the streaming and rendering of the scene in the current field of view. Further more, if we can predict salient areas in the scene, then more fine processing can be done to the visually important areas. The prediction of salient regions for traditional images and videos have been extensively studied. However, conventional saliency prediction methods are not fully adequate for 360 degree contents, because 360 degree stimuli own some unique characteristics. Related study in this area is limited. In this paper, we study the problem of predicting head movement, head–eye motion, and scanpath of viewers when they are watching 360 degree images in the commodity HMDs. Three types of data are specifically analyzed. The first is the head movement data, which can be regarded as the movement of the view port. The second is the head–eye motion data which combines the motion of the head and the movement of the eye within the view port. The third is the scan-paths data of observers in the entire panorama which record the position information as well as the time information. And our model is designed to predict the saliency maps for the first two, and the scanpaths for the last one. Experimental results demonstrate the effectiveness of our model.

Learning a Deep Agent to Predict Head Movement in 360-Degree Images

The prediction of head and eye movement for 360 degree images

The Prediction of Saliency Map for Head and Eye Movements in 360 Degree Images

Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach

Prediction of Head Movement in 360-Degree Videos Using Attention Model

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Viewing Behavior Supported Visual Saliency Predictor for 360 Degree Videos

Attention-Based Deep Reinforcement Learning for Virtual Cinematography of 360 Videos

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

Modeling Attention in Panoramic Video: A Deep Reinforcement Learning Approach.

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

Neural3Points: Learning to Generate Physically Realistic Full-body Motion for Virtual Reality Users

Predicting 360° Video Saliency: A ConvLSTM Encoder-Decoder Network with Spatio-temporal Consistency

Pluto: Motion Detection for Navigation in a VR Headset

Deep Variational Learning for 360° Adaptive Streaming

Unsupervised Learning of Depth and Ego-Motion from Cylindrical Panoramic Video with Applications for Virtual Reality

DeepVR: Deep Reinforcement Learning for Predictive Panoramic Video Streaming.

Prediction of Learning Effectiveness Assessment and Experience Evaluation of Panoramic VR Video in Hands-On Education and Design of Its Application

Predictive View Generation to Enable Mobile 360-degree and VR Experiences.

Long short-term memory prediction of user's locomotion in virtual reality

Predictive Context-Awareness for Full-Immersive Multiuser Virtual Reality with Redirected Walking