GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

Boyao Zhou,Shunyuan Zheng,Hanzhang Tu,Ruizhi Shao,Boning Liu,Shengping Zhang,Liqiang Nie,Yebin Liu
2024-11-18
Abstract:Differentiable rendering techniques have recently shown promising results for free-viewpoint video synthesis of characters. However, such methods, either Gaussian Splatting or neural implicit rendering, typically necessitate per-subject optimization which does not meet the requirement of real-time rendering in an interactive application. We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. To this end, we introduce Gaussian parameter maps defined on the source views and directly regress Gaussian properties for instant novel view synthesis without any fine-tuning or optimization. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable with both depth and rendering supervision or with only rendering supervision. We further introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between two source views, especially when neglecting depth supervision. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the real - time rendering problem of generating free - viewpoint videos (FVV) from sparse - view inputs. Specifically, the paper proposes a general high - resolution image rendering method that can achieve instant new - view synthesis under a sparse camera setup without the need for optimization or fine - tuning for each scene. The main problems can be summarized as follows: 1. **Requirement for real - time rendering**: Existing differentiable rendering techniques (such as Gaussian point - cloud rendering and neural implicit rendering) usually need to be optimized for each object, which does not meet the requirements of real - time interactive applications. 2. **High - resolution rendering**: Under sparse views, how to efficiently generate high - quality high - resolution images, especially in the absence of depth supervision. 3. **Geometric consistency**: How to maintain geometric consistency between two source views, especially in the absence of depth supervision. ### Solutions The paper proposes a general Gaussian point - cloud rendering method (GPS - Gaussian+), which solves the above problems through the following key techniques: 1. **Pixel - level Gaussian parameter map**: Define a Gaussian parameter map (including depth residual, color, scale, rotation, and transparency) on the source - view image plane, thereby achieving a pixel - level 3D Gaussian representation. 2. **Binocular stereo matching**: Utilize binocular stereo - matching techniques as geometric cues and combine with the 3D Gaussian rendering pipeline to achieve efficient depth estimation. 3. **Fully - differentiable framework**: The entire framework is fully - differentiable, and the depth - estimation module and the Gaussian - parameter - regression module can be jointly trained, while using the rendering loss and the depth loss (if there is depth supervision). 4. **Geometric regularization**: Introduce a geometric regularization term and an epipolar - attention mechanism to maintain the geometric consistency between two source views in the absence of depth supervision. 5. **Real - time performance**: Through an efficient 2D convolutional network and fast - rendering techniques, generate high - fidelity free - viewpoint videos at a speed of approximately 25 FPS on modern graphics cards. ### Experimental results The experimental results show that this method outperforms the existing state - of - the - art methods on multiple datasets, not only performs excellently in rendering quality but also can achieve real - time rendering. In particular, this method also shows good generalization ability when dealing with unseen characters and complex scenes. ### Summary The paper successfully solves the real - time rendering problem of generating free - viewpoint videos from sparse - view inputs by proposing a general Gaussian point - cloud rendering method, providing a new solution for real - time interactive applications.