Abstract:Immersive video applications grow faster for users to freely navigate within a virtualized 3D environment for entertainment, productivity, training, etc. Fundamentally, such system can be facilitated by an interactive Gigapixel Video Streaming (iGVS) platform from array camera capturing to end user interaction. This interactive system demands a large amount of network bandwidth to sustain the reliable service provisioning, hindering its massive market adoption. Thus, we propose to segment the gigapixel scene into non-overlapped spatial tiles. Each tile only covers a sub-region of the entire scene. One or more tiles will be used to represent an instantaneous viewport interested by a specific user. Tiles are then encoded at a variety of quality scales using various combinations of spatial, temporal and amplitude resolutions (STAR), which are typically encapsulated into temporally-aligned tile video chunks (or simply chunks). Chunks at different quality level can be processed in parallel for real-time purpose. With such setup, diverse chunk combinations can be simultaneously accessed by heterogeneous user per its request, and viewport-adaptation based content navigation in an immersive space can be also realized by adapting multiscale chunks properly, under the bandwidth constraints. A serial computational vision models measuring the perceptual quality of viewport video in terms of its quality scales, adaptation factors, as well as the peripheral vision thresholds, are devised to prepare and guide the chunk adaptation for the best perceptual quality index. Furthermore, in response to the time-varying network, a deep reinforcement learning (DRL) based adaptive real-time streaming (ARS) scheme is developed, by learning the future decision from the historical network states, to maximize the overall quality of experience (QoE) in a practical Internet-based streaming scenario. Our experiments have revealed that averaged QoE can be improved by about 60%, and its standard deviation can be also reduced by approximate to 30%, in comparison to the popular Google congestion control algorithm widely adopted in existing system for adaptive streaming, demonstrating the efficiency of our multiscale accelerated iGVS for immersive video application.

CaV3: Cache-assisted Viewport Adaptive Volumetric Video Streaming

VRCT: A Viewport Reconstruction-Based 360° Video Caching Solution for Tile-Adaptive Streaming

Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information

LiveVV: Human-Centered Live Volumetric Video Streaming System

Viewport-Aware Deep Reinforcement Learning Approach for 360$^o$ Video Caching

Toward Adaptive Volumetric Video Streaming: A Joint Network-Viewport Adaptation Framework

Design and Analysis of MEC- and Proactive Caching-Based 360 Mobile VR Video Streaming

Understanding User Behavior in Volumetric Video Watching: Dataset, Analysis and Prediction

Interactive Gigapixel Video Streaming Via Multiscale Acceleration

Video Super-Resolution and Caching - an Edge-Assisted Adaptive Video Streaming Solution.

VMP360

Optimal Viewport-Adaptive 360-Degree Video Streaming Against Random Head Movement.

Spatial Perceptual Quality Aware Adaptive Volumetric Video Streaming

Multi-view Video Coding Based on View Prediction

Optimal Volumetric Video Streaming with Hybrid Saliency based Tiling

Spatial Visibility and Temporal Dynamics: Revolutionizing Field of View Prediction in Adaptive Point Cloud Video Streaming

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Probabilistic Viewport Adaptive Streaming for 360-Degree Videos

Synergistic Temporal-Spatial User-Aware Viewport Prediction for Optimal Adaptive 360-Degree Video Streaming

Progressive Frame Patching for FoV-based Point Cloud Video Streaming

FSVVD: A Dataset of Full Scene Volumetric Video