Optimizing Mobile-Friendly Viewport Prediction for Live 360-Degree Video Streaming

Lei Zhang,Tao Long,Weizhen Xu,Laizhong Cui,Jiangchuan Liu
2024-03-05
Abstract:Viewport prediction is the crucial task for adaptive 360-degree video streaming, as the bitrate control algorithms usually require the knowledge of the user's viewing portions of the frames. Various methods are studied and adopted for viewport prediction from less accurate statistic tools to highly calibrated deep neural networks. Conventionally, it is difficult to implement sophisticated deep learning methods on mobile devices, which have limited computation capability. In this work, we propose an advanced learning-based viewport prediction approach and carefully design it to introduce minimal transmission and computation overhead for mobile terminals. We also propose a model-agnostic meta-learning (MAML) based saliency prediction network trainer, which provides a few-sample fast training solution to obtain the prediction model by utilizing the information from the past models. We further discuss how to integrate this mobile-friendly viewport prediction (MFVP) approach into a typical 360-degree video live streaming system by formulating and solving the bitrate adaptation problem. Extensive experiment results show that our prediction approach can work in real-time for live video streaming and can achieve higher accuracies compared to other existing prediction methods on mobile end, which, together with our bitrate adaptation algorithm, significantly improves the streaming QoE from various aspects. We observe the accuracy of MFVP is 8.1$\%$ to 28.7$\%$ higher than other algorithms and achieves 3.73$\%$ to 14.96$\%$ higher average quality level and 49.6$\%$ to 74.97$\%$ less quality level change than other algorithms.
Multimedia,Image and Video Processing
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the conflict between the performance of viewport prediction in 360-degree video live streaming and the feasibility on mobile devices. Specifically, viewport prediction is a key task in 360-degree video adaptive streaming because it directly affects the allocation of bandwidth resources and the quality of user experience. Traditional viewport prediction methods are either low in accuracy but suitable for mobile devices, or high in accuracy but computationally expensive, making them difficult to implement on mobile devices. Therefore, the paper proposes an advanced learning-based mobile-friendly viewport prediction (MFVP) method to ensure high accuracy while minimizing transmission and computational overhead, making it suitable for mobile terminals. ### Main Contributions 1. **Proposing MFVP**: A viewport prediction scheme based on advanced Graph Convolutional Networks (GCN) and modified Long Short-Term Memory networks (LSTM). GCN is used to generate saliency maps of user interest areas, and LSTM combines historical viewport information and saliency maps to predict future viewports. 2. **Fast Training Method**: Using a few-shot fast trainer based on Model-Agnostic Meta-Learning (MAML) to quickly train the saliency prediction network for each new video, thus quickly adapting to new video content in real-time live streaming. 3. **Optimized Implementation**: Ensuring that MFVP can run in real-time on mobile devices by reducing computational and transmission costs. Specific measures include reducing data sampling frequency, compressing saliency maps, and learning models. 4. **System Integration**: Integrating MFVP with a typical bitrate adaptive streaming system, designing an efficient bitrate adaptive algorithm, and experimentally verifying its superior performance in 360-degree video live streaming. ### Experimental Results Experimental results show that MFVP's prediction accuracy is 8.1% to 28.7% higher than other existing algorithms, the average quality level is 3.73% to 14.96% higher, and the quality level variation is reduced by 49.6% to 74.97%. These results demonstrate the significant advantages of MFVP in improving the user experience of 360-degree video live streaming.