Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information

Jie Li,Zhixin Li,Zhi Liu,Pengyuan Zhou,Richang Hong,Qiyue Li,Han Hu
2024-06-28
Abstract:Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is essential to have precise viewport prediction for optimal performance. However, research on this topic is still in its infancy. In the end, this paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP), which aims to improve the precision of viewport prediction in volumetric video streaming. The STVP extensively utilizes video saliency information and viewport trajectory. To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming. In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity while still preserving video features in an efficient manner. Then we present a saliency detection technique that incorporates both spatial and temporal information for detecting static, dynamic geometric, and color salient regions. Finally, we intelligently fuse saliency and trajectory information to achieve more accurate viewport prediction. We conduct extensive simulations to evaluate the effectiveness of our proposed viewport prediction methods using state-of-the-art volumetric video sequences. The experimental results show the superiority of the proposed method over existing schemes. The dataset and source code will be publicly accessible after acceptance.
Computer Vision and Pattern Recognition,Multimedia
What problem does this paper attempt to address?
The paper attempts to address the problem of accurately predicting the user's viewport in volumetric video (also known as holographic video) streaming. Specifically, since each user typically only watches a portion of the volumetric video, precise viewport prediction is crucial for optimizing streaming performance. However, current research in this field is still in its early stages and faces the following technical challenges: 1. **Low efficiency in temporal feature extraction of large-scale point cloud videos**: The density of point cloud videos demands high computational resources, and efficient methods are needed to handle large-scale data for temporal feature extraction. 2. **Lack of effective sampling methods to preserve temporal information**: Existing sampling methods may not effectively retain temporal and spatial information when dealing with large-scale point cloud videos. 3. **Insufficient encoding techniques to capture local visual saliency differences**: Current spatial aggregation methods mainly focus on encoding geometric features, neglecting the importance of luminance information, which limits the accurate representation of spatial saliency features in single-frame point cloud videos. To address these challenges, the paper proposes a new method—Saliency and Trajectory Viewport Prediction (STVP), which improves viewport prediction accuracy by comprehensively utilizing video saliency and viewport trajectory information. The specific methods include: - **Uniform Random Sampling (URS)**: Combining spatial uniform partitioning, random sampling, and K-nearest neighbors (KNN) methods to efficiently preserve temporal and spatial information of point cloud videos. - **Saliency detection techniques**: Detecting static and dynamic geometric and luminance salient regions by combining spatial and temporal information. - **Trajectory prediction and feature fusion**: Using LSTM networks to analyze historical trajectories and deeply fuse saliency and trajectory information to achieve more accurate viewport prediction. Through these methods, the paper aims to improve the accuracy of viewport prediction in volumetric video streaming, thereby optimizing the user's viewing experience. Experimental results validate the superiority of this method over existing methods.