Abstract:360-degree video streaming has been gaining popularities recently with the rapid growth of adopting mobile head mounted display (HMD) devices in the consumer video market, especially for live broadcasts. The 360-degree video streaming introduces brand new bandwidth and latency challenges in live streaming due to the significantly increased video data. However, most of the existing bandwidth saving approaches based on viewport prediction have only focused on the video-on-demand (VOD) use cases leveraging historical user behavior data, which is not available in live broadcasts. We develop a new viewport prediction scheme for live 360-degree video streaming using video content-based motion tracking and dynamic user interest modeling. To obtain real-time performance, we implement the Gaussian mixture model (GMM) and optical flow algorithms for motion detection and feature tracking. Then, the user's future viewport of interest is generated by leveraging a dynamic user interest model that weighs all the features and motion information abstracted from the live video frames. Furthermore, we develop two enhancement techniques that take into consideration of user feedback for fast error recovery and view updates. Consequently, our predicted viewports are irregular and dynamically adjusted to cover the maximum portions of the actual user viewports and thus ensure a high prediction accuracy. We evaluate our viewport prediction approach using a public user head movement dataset, which contains the data of 48 users watching 6 360-degree videos. The experimental results show that the proposed approach supports sophisticated user head movement patterns and outperforms the existing velocity-based approach in terms of prediction accuracy. In addition, the motion tracking scheme introduces minimum latency overhead to ensure the quality of live streaming experience.

LadderNet: Knowledge Transfer Based Viewpoint Prediction in 360◦ Video

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Very Long Term Field of View Prediction for 360-degree Video Streaming

Saliency Prediction Network for $360^\circ$ Videos

Towards Low Latency Multi-viewpoint 360° Interactive Video: A Multimodal Deep Reinforcement Learning Approach

Predicting 360° Video Saliency: A ConvLSTM Encoder-Decoder Network with Spatio-temporal Consistency

Optimizing Fixation Prediction Using Recurrent Neural Networks for 360$^{\circ }$ Video Streaming in Head-Mounted Virtual Reality

CoLive: an Edge-Assisted Online Learning Framework for Viewport Prediction in 360° Live Streaming

Learning-based Prediction and Uplink Retransmission for Wireless Virtual Reality (VR) Network

Viewport Prediction for Live 360-Degree Mobile Video Streaming Using User-Content Hybrid Motion Tracking

Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming

CUB360: Exploiting Cross-Users Behaviors for Viewport Prediction in 360 Video Adaptive Streaming

Viewport Proposal CNN for 360° Video Quality Assessment

Fixation Prediction for 360 ° Video Streaming to Head-Mounted Displays

Subtitle-based Viewport Prediction for 360-degree Virtual Tourism Video

A Hybrid Transformer-LSTM Model With 3D Separable Convolution for Video Prediction

Viewport-based CNN: A Multi-task Approach for Assessing 360° Video Quality

Long Short-Term Memory-Based Non-Uniform Coding Transmission Strategy for a 360-Degree Video

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

T3VIP: Transformation-based 3D Video Prediction

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video