Abstract:Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D humans from sparse views presents formidable hurdles due to the inherent complexity of human geometry, resulting in inaccurate reconstructions of geometry and textures. To tackle this challenge, this paper leverages recent advancements in Gaussian Splatting and introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views in a feed-forward manner. A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template, which allows leveraging the strong geometry prior and the advantages of 2D convolutions. In addition, a multi-scaffold is proposed to effectively represent the offset details. Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to generate realistic and accurate new - view images of new human objects in the case of very sparse views. Specifically, existing neural rendering methods (such as NeRF and Gaussian Splatting) perform well in interpolating training data, but face challenges when generalizing to new scenes and objects, especially when there are only a few views. For human body modeling, due to the complexity of the human body's geometric structure (such as joint movements, self - occlusion, and complex surface geometries like hair), it is particularly difficult to model a 3D human body from sparse views, resulting in inaccurate reconstruction of geometry and texture. To solve this problem, the paper proposes the Generalizable Human Gaussians (GHG) method, aiming to achieve realistic and accurate new - view rendering of new human objects in a feed - forward manner through limited sparse views, without any test - time optimization or fine - tuning. The main contributions of GHG include: 1. **Proposing a new feed - forward method**: This method achieves accurate and realistic new - view rendering of new people from very sparse input views by redefining the learning of 3D Gaussian parameters as a regression task in the 2D UV space based on the human template model. 2. **Multi - scaffold representation**: In order to minimize the difference between the template model and the real human body geometry, a multi - scaffold representation method is proposed, which allows the learning of Gaussian parameters in multiple scaffold spaces, thereby representing displacements more comprehensively, beyond the capacity of a single template mesh space. Through these innovations, GHG shows better performance than existing methods when dealing with the new - view synthesis task in the sparse - view setting, especially in the cross - dataset generalization setting.

Generalizable Human Gaussians for Sparse View Synthesis

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

Generalizable Human Gaussians from Single-View Image

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features

MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis

WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings

Geometry-guided generalizable NeRF for human rendering

Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

Unbounded-GS: Extending 3D Gaussian Splatting with Hybrid Representation for Unbounded Large-Scale Scene Reconstruction