GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Shunyuan Zheng,Boyao Zhou,Ruizhi Shao,Boning Liu,Shengping Zhang,Liqiang Nie,Yebin Liu
2024-04-16
Abstract:We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve the problem of real - time high - fidelity novel view synthesis (NVS), especially for novel view synthesis of human characters in a sparse - view camera setup. Specifically, the paper proposes a new method named GPS - Gaussian, which can achieve real - time 2K - resolution novel - view image synthesis without any fine - tuning or optimization. Compared with existing methods, such as ENeRF, FloRen and 3D - GS, etc., these methods usually require dense input views or accurate proxy geometries, while GPS - Gaussian can efficiently generate high - quality images in sparse views while maintaining real - time performance. ### Core contributions of the paper: 1. **Generalized 3D Gaussian point distribution method**: A method of constructing 3D Gaussian points using pixel - level Gaussian parameter maps defined on the 2D source image plane is proposed, thereby directly regressing Gaussian parameters in the forward pass and avoiding the need for per - subject optimization. 2. **Fully - differentiable framework**: A fully - differentiable framework consisting of an iterative depth estimation module and a Gaussian parameter regression module is designed. The intermediate - predicted depth map connects these two components and allows them to benefit from joint training. 3. **Real - time novel - view synthesis system**: An NVS system capable of achieving real - time 2K - resolution rendering is developed, by directly regressing the Gaussian parameter map without any fine - tuning or optimization for new scenes. ### Method overview: - **View selection and depth estimation**: Select two adjacent source views and extract features through a shared image encoder. Then use a binocular depth estimator to predict the depth maps of the source views. - **Pixel - level Gaussian parameter prediction**: Based on the predicted depth maps and the source RGB images, predict the position, color, rotation, scale and opacity parameters of each Gaussian point. - **Joint training and differentiable rendering**: Through differentiable rendering techniques, lift the predicted Gaussian parameter map from the 2D image plane to 3D space and aggregate the Gaussian points from the two views to generate the image of the target view. The entire framework supports end - to - end joint training. ### Experimental results: - **Quantitative comparison**: On the THuman2.0, Twindom and self - collected real - world datasets, GPS - Gaussian outperforms other methods in terms of PSNR, SSIM and LPIPS, etc., and has a faster rendering speed. - **Qualitative analysis**: When dealing with occlusions and slender structures (such as hockey sticks and robes), GPS - Gaussian shows better robustness and detail - preservation ability. - **Ablation experiments**: Verify the importance of joint training and the depth encoder in improving the rendering quality and the accuracy of depth estimation. In conclusion, this paper proposes an efficient and general - purpose real - time novel - view synthesis method, which can generate high - quality images in sparse views and has broad application prospects.