Abstract:Differentiable rendering techniques have recently shown promising results for free-viewpoint video synthesis of characters. However, such methods, either Gaussian Splatting or neural implicit rendering, typically necessitate per-subject optimization which does not meet the requirement of real-time rendering in an interactive application. We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. To this end, we introduce Gaussian parameter maps defined on the source views and directly regress Gaussian properties for instant novel view synthesis without any fine-tuning or optimization. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable with both depth and rendering supervision or with only rendering supervision. We further introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between two source views, especially when neglecting depth supervision. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the real - time rendering problem of generating free - viewpoint videos (FVV) from sparse - view inputs. Specifically, the paper proposes a general high - resolution image rendering method that can achieve instant new - view synthesis under a sparse camera setup without the need for optimization or fine - tuning for each scene. The main problems can be summarized as follows: 1. **Requirement for real - time rendering**: Existing differentiable rendering techniques (such as Gaussian point - cloud rendering and neural implicit rendering) usually need to be optimized for each object, which does not meet the requirements of real - time interactive applications. 2. **High - resolution rendering**: Under sparse views, how to efficiently generate high - quality high - resolution images, especially in the absence of depth supervision. 3. **Geometric consistency**: How to maintain geometric consistency between two source views, especially in the absence of depth supervision. ### Solutions The paper proposes a general Gaussian point - cloud rendering method (GPS - Gaussian+), which solves the above problems through the following key techniques: 1. **Pixel - level Gaussian parameter map**: Define a Gaussian parameter map (including depth residual, color, scale, rotation, and transparency) on the source - view image plane, thereby achieving a pixel - level 3D Gaussian representation. 2. **Binocular stereo matching**: Utilize binocular stereo - matching techniques as geometric cues and combine with the 3D Gaussian rendering pipeline to achieve efficient depth estimation. 3. **Fully - differentiable framework**: The entire framework is fully - differentiable, and the depth - estimation module and the Gaussian - parameter - regression module can be jointly trained, while using the rendering loss and the depth loss (if there is depth supervision). 4. **Geometric regularization**: Introduce a geometric regularization term and an epipolar - attention mechanism to maintain the geometric consistency between two source views in the absence of depth supervision. 5. **Real - time performance**: Through an efficient 2D convolutional network and fast - rendering techniques, generate high - fidelity free - viewpoint videos at a speed of approximately 25 FPS on modern graphics cards. ### Experimental results The experimental results show that this method outperforms the existing state - of - the - art methods on multiple datasets, not only performs excellently in rendering quality but also can achieve real - time rendering. In particular, this method also shows good generalization ability when dealing with unseen characters and complex scenes. ### Summary The paper successfully solves the real - time rendering problem of generating free - viewpoint videos from sparse - view inputs by proposing a general Gaussian point - cloud rendering method, providing a new solution for real - time interactive applications.

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

Mixed 3D Gaussian for Dynamic Scenes Representation and Rendering

Generalizable Human Gaussians for Sparse View Synthesis

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering

GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

3D Gaussian Splatting for Real-Time Radiance Field Rendering