CaV3: Cache-assisted Viewport Adaptive Volumetric Video Streaming

Junhua Liu,Boxiang Zhu,Fangxin Wang,Yili Jin,Wenyi Zhang,Zihan Xu,Shuguang Cui
DOI: https://doi.org/10.1109/vr55154.2023.00033
2023-01-01
Abstract:Volumetric video (VV) recently emerges as a new form of video application providing a photorealistic immersive 3D viewing experience with 6 degree-of-freedom (DoF), which empowers many applications such as VR, AR, and Metaverse. A key problem therein is how to stream the enormous size VV through the network with limited bandwidth. Existing works mostly focused on predicting the viewport for a tiling-based adaptive VV streaming, which however only has quite a limited effect on resource saving. We argue that the content repeatability in the viewport can be further leveraged, and for the first time, propose a client-side cache-assisted strategy that aims to buffer the repeatedly appearing VV tiles in the near future so as to reduce the redundant VV content transmission. The key challenges exist in three aspects, including (1) feature extraction and mining in 6 DoF VV context, (2) accurate long-term viewing pattern estimation and (3) optimal caching scheduling with limited capacity. In this paper, we propose CaV3, an integrated cache-assisted viewport adaptive VV streaming framework to address the challenges. CaV3 employs a Long-short term Sequential prediction model (LSTSP) that achieves accurate short-term, mid-term and long-term viewing pattern prediction with a multi-modal fusion model by capturing the viewer's behavior inertia, current attention, and subjective intention. Besides, CaV3 also contains a contextual MAB-based caching adaptation algorithm (CCA) to fully utilize the viewing pattern and solve the optimal caching problem with a proved upper bound regret. Compared to existing VV datasets only containing single or co-located objects, we for the first time collect a comprehensive dataset with sufficient practical unbounded 360° scenes. The extensive evaluation of the dataset confirms the superiority of CaV3, which outperforms the SOTA algorithm by 15.6%-43% in viewport prediction and 13%-40% in system utility.
What problem does this paper attempt to address?