Abstract:Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: <a class="link-external link-https" href="https://tatakai1.github.io/efreesplat/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that when generating new views from sparse - view observations, existing methods rely too much on epipolar priors, which will lead to unreliable geometric reconstruction problems in complex real - scene situations, especially in non - overlapping or occluded areas. Specifically, the paper proposes a new method named eFreeSplat, aiming to achieve Generalized New View Synthesis (GNVS) without relying on epipolar priors. Through this method, researchers hope to improve geometric perception and appearance quality when dealing with non - overlapping and occluded areas, so as to achieve better new - view - synthesis effects in a wider range of scenarios. ### Main Contributions 1. **Proposing eFreeSplat**: This is a new general 3D Gaussian splatting model, which can be independent of epipolar priors in the multi - view geometric - perception process, thus showing stronger robustness when dealing with new scenarios of sparse and non - overlapping observations. 2. **Introducing the Iterative Cross - View Gaussian Alignment (ICGA) method**: This method is used to maintain the consistency of depth scales between different views and reduce the problems of artifacts and pixel displacement in rendering. 3. **Experimental verification**: eFreeSplat outperforms existing epipolar - prior - based methods on the RealEstate10K and ACID datasets, demonstrating its superiority in cross - scene rendering performance. ### Method Overview 1. **Cross - view mutual - perception without epipolar priors**: - Use pre - trained Vision Transformer (ViT) and cross - attention decoder to extract cross - view image features. - Provide powerful 3D prior information through self - supervised cross - view completion tasks, thereby achieving 3D perception without relying on epipolar priors. 2. **Iterative cross - view Gaussian alignment**: - Utilize 2D U - Net to predict the Gaussian depth and features of each pixel. - Calculate the similarity between warped features and corresponding features, and iteratively update the positions and features of Gaussian points to ensure the consistency of depth scales between different views. 3. **Gaussian parameter prediction**: - Calculate the Gaussian center positions of each view based on refined depth and camera parameters. - Use an additional U - Net to predict other Gaussian parameters (such as covariance matrix, opacity and spherical harmonic coefficients). ### Experimental Results - **Quantitative comparison**: On the RealEstate10K and ACID datasets, eFreeSplat is superior to existing SOTA methods in terms of PSNR, SSIM and LPIPS metrics. - **Qualitative comparison**: eFreeSplat shows fewer artifacts and object deformations in the rendering results of non - overlapping and occluded areas, proving its advantages in these challenging areas. ### Summary eFreeSplat significantly improves the robustness and quality of new - view - synthesis in complex real - scene situations by means of a method independent of epipolar priors. This method performs well in dealing with non - overlapping and occluded areas and provides a new solution for 3D vision tasks.

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis

FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting

3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

Self-Ensembling Gaussian Splatting for Few-shot Novel View Synthesis

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Feature Splatting for Better Novel View Synthesis with Low Overlap

EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings

Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting

SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis