Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

Zhiyuan Min,Yawei Luo,Jianwen Sun,Yi Yang
2024-10-31
Abstract:Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: <a class="link-external link-https" href="https://tatakai1.github.io/efreesplat/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that when generating new views from sparse - view observations, existing methods rely too much on epipolar priors, which will lead to unreliable geometric reconstruction problems in complex real - scene situations, especially in non - overlapping or occluded areas. Specifically, the paper proposes a new method named eFreeSplat, aiming to achieve Generalized New View Synthesis (GNVS) without relying on epipolar priors. Through this method, researchers hope to improve geometric perception and appearance quality when dealing with non - overlapping and occluded areas, so as to achieve better new - view - synthesis effects in a wider range of scenarios. ### Main Contributions 1. **Proposing eFreeSplat**: This is a new general 3D Gaussian splatting model, which can be independent of epipolar priors in the multi - view geometric - perception process, thus showing stronger robustness when dealing with new scenarios of sparse and non - overlapping observations. 2. **Introducing the Iterative Cross - View Gaussian Alignment (ICGA) method**: This method is used to maintain the consistency of depth scales between different views and reduce the problems of artifacts and pixel displacement in rendering. 3. **Experimental verification**: eFreeSplat outperforms existing epipolar - prior - based methods on the RealEstate10K and ACID datasets, demonstrating its superiority in cross - scene rendering performance. ### Method Overview 1. **Cross - view mutual - perception without epipolar priors**: - Use pre - trained Vision Transformer (ViT) and cross - attention decoder to extract cross - view image features. - Provide powerful 3D prior information through self - supervised cross - view completion tasks, thereby achieving 3D perception without relying on epipolar priors. 2. **Iterative cross - view Gaussian alignment**: - Utilize 2D U - Net to predict the Gaussian depth and features of each pixel. - Calculate the similarity between warped features and corresponding features, and iteratively update the positions and features of Gaussian points to ensure the consistency of depth scales between different views. 3. **Gaussian parameter prediction**: - Calculate the Gaussian center positions of each view based on refined depth and camera parameters. - Use an additional U - Net to predict other Gaussian parameters (such as covariance matrix, opacity and spherical harmonic coefficients). ### Experimental Results - **Quantitative comparison**: On the RealEstate10K and ACID datasets, eFreeSplat is superior to existing SOTA methods in terms of PSNR, SSIM and LPIPS metrics. - **Qualitative comparison**: eFreeSplat shows fewer artifacts and object deformations in the rendering results of non - overlapping and occluded areas, proving its advantages in these challenging areas. ### Summary eFreeSplat significantly improves the robustness and quality of new - view - synthesis in complex real - scene situations by means of a method independent of epipolar priors. This method performs well in dealing with non - overlapping and occluded areas and provides a new solution for 3D vision tasks.