GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

Hao Li,Yuanyuan Gao,Chenming Wu,Dingwen Zhang,Yalun Dai,Chen Zhao,Haocheng Feng,Errui Ding,Jingdong Wang,Junwei Han
DOI: https://doi.org/10.1007/978-3-031-73209-6_19
2024-01-01
Abstract:This paper presents GGRt, a novel approach to generalizable novel viewsynthesis that alleviates the need for real camera poses, complexity inprocessing high-resolution images, and lengthy optimization processes, thusfacilitating stronger applicability of 3D Gaussian Splatting (3D-GS) inreal-world scenarios. Specifically, we design a novel joint learning frameworkthat consists of an Iterative Pose Optimization Network (IPO-Net) and aGeneralizable 3D-Gaussians (G-3DG) model. With the joint learning mechanism,the proposed framework can inherently estimate robust relative pose informationfrom the image observations and thus primarily alleviate the requirement ofreal camera poses. Moreover, we implement a deferred back-propagation mechanismthat enables high-resolution training and inference, overcoming the resolutionconstraints of previous methods. To enhance the speed and efficiency, wefurther introduce a progressive Gaussian cache module that dynamically adjustsduring training and inference. As the first pose-free generalizable 3D-GSframework, GGRt achieves inference at ≥ 5 FPS and real-time rendering at≥ 100 FPS. Through extensive experimentation, we demonstrate that ourmethod outperforms existing NeRF-based pose-free techniques in terms ofinference speed and effectiveness. It can also approach the real pose-based3D-GS methods. Our contributions provide a significant leap forward for theintegration of computer vision and computer graphics into practicalapplications, offering state-of-the-art results on LLFF, KITTI, and Waymo Opendatasets and enabling real-time rendering for immersive experiences.
What problem does this paper attempt to address?