V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians

Penghao Wang,Zhirui Zhang,Liao Wang,Kaixin Yao,Siyuan Xie,Jingyi Yu,Minye Wu,Lan Xu
2024-09-23
Abstract:Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V^3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V^3, outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at <a class="link-external link-https" href="https://authoritywang.github.io/v3/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The paper attempts to address the problem of seamlessly viewing high-fidelity volumetric video on mobile devices. Specifically, although existing dynamic 3D Gaussian Splatting (3DGS) methods perform well in terms of rendering quality, streaming on mobile devices remains challenging due to computational and bandwidth limitations. The paper proposes a new method called V3, which achieves high-quality mobile rendering by streaming dynamic Gaussian point clouds in the form of 2D dynamic Gaussian videos. The key innovation lies in treating dynamic 3DGS as 2D video, facilitating the use of hardware video codecs, and proposing a two-stage training strategy to reduce storage requirements while speeding up training. The core idea of the V3 method is to represent dynamic Gaussian sequences as compact 2D Gaussian videos, enabling efficient streaming and decoding by hardware video codecs. In this way, efficient rendering on various portable devices can be achieved. Additionally, the paper proposes a multi-platform player that supports real-time playback and streaming of volumetric video, allowing users to enjoy a unique volumetric video viewing experience on different devices. In summary, V3 aims to address how to efficiently stream and render dynamic 3DGS on mobile devices to achieve a high-fidelity volumetric video viewing experience.