Abstract:Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is essential to have precise viewport prediction for optimal performance. However, research on this topic is still in its infancy. In the end, this paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP), which aims to improve the precision of viewport prediction in volumetric video streaming. The STVP extensively utilizes video saliency information and viewport trajectory. To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming. In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity while still preserving video features in an efficient manner. Then we present a saliency detection technique that incorporates both spatial and temporal information for detecting static, dynamic geometric, and color salient regions. Finally, we intelligently fuse saliency and trajectory information to achieve more accurate viewport prediction. We conduct extensive simulations to evaluate the effectiveness of our proposed viewport prediction methods using state-of-the-art volumetric video sequences. The experimental results show the superiority of the proposed method over existing schemes. The dataset and source code will be publicly accessible after acceptance.

What problem does this paper attempt to address?

The paper attempts to address the problem of accurately predicting the user's viewport in volumetric video (also known as holographic video) streaming. Specifically, since each user typically only watches a portion of the volumetric video, precise viewport prediction is crucial for optimizing streaming performance. However, current research in this field is still in its early stages and faces the following technical challenges: 1. **Low efficiency in temporal feature extraction of large-scale point cloud videos**: The density of point cloud videos demands high computational resources, and efficient methods are needed to handle large-scale data for temporal feature extraction. 2. **Lack of effective sampling methods to preserve temporal information**: Existing sampling methods may not effectively retain temporal and spatial information when dealing with large-scale point cloud videos. 3. **Insufficient encoding techniques to capture local visual saliency differences**: Current spatial aggregation methods mainly focus on encoding geometric features, neglecting the importance of luminance information, which limits the accurate representation of spatial saliency features in single-frame point cloud videos. To address these challenges, the paper proposes a new method—Saliency and Trajectory Viewport Prediction (STVP), which improves viewport prediction accuracy by comprehensively utilizing video saliency and viewport trajectory information. The specific methods include: - **Uniform Random Sampling (URS)**: Combining spatial uniform partitioning, random sampling, and K-nearest neighbors (KNN) methods to efficiently preserve temporal and spatial information of point cloud videos. - **Saliency detection techniques**: Detecting static and dynamic geometric and luminance salient regions by combining spatial and temporal information. - **Trajectory prediction and feature fusion**: Using LSTM networks to analyze historical trajectories and deeply fuse saliency and trajectory information to achieve more accurate viewport prediction. Through these methods, the paper aims to improve the accuracy of viewport prediction in volumetric video streaming, thereby optimizing the user's viewing experience. Experimental results validate the superiority of this method over existing methods.

Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information

A Viewport Prediction Framework for Panoramic Videos

A Self-Attention Model for Viewport Prediction Based on Distance Constraint

Subtitle-based Viewport Prediction for 360-degree Virtual Tourism Video

CaV3: Cache-assisted Viewport Adaptive Volumetric Video Streaming

Viewport-adaptive 360-degree video coding

SINGLE AND SEQUENTIAL VIEWPORTS PREDICTION FOR 360-DEGREE VIDEO STREAMING

Buffer-Aware Virtual Reality Video Streaming with Personalized and Private Viewport Prediction

Viewport-Dependent Saliency Prediction in 360 Video

Viewing Behavior Supported Visual Saliency Predictor for 360 Degree Videos

Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction

Optimal Viewport-Adaptive 360-Degree Video Streaming Against Random Head Movement.

Spherical Convolution empowered Viewport Prediction in 360 Video Multicast with Limited FoV Feedback

Viewport Prediction for Live 360-Degree Mobile Video Streaming Using User-Content Hybrid Motion Tracking

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming

Viewport Prediction for Panoramic Video with Multi-CNN

Panoramic Video Inter Frame Prediction and Viewport Prediction Based on Background Modeling

Optimal Volumetric Video Streaming with Hybrid Saliency based Tiling

From Capture to Display: A Survey on Volumetric Video

Probabilistic Viewport Adaptive Streaming for 360-Degree Videos