Abstract:Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.

What problem does this paper attempt to address?

The paper aims to address the problem of accurately reconstructing scene surfaces from the 3D Gaussian Splatting (3DGS) model. Although 3DGS performs excellently in novel view synthesis, directly extracting geometric information from optimized Gaussian attributes remains challenging because these attributes are primarily optimized based on photometric loss, resulting in reconstructed surfaces that are noticeably noisy and unrealistic. The proposed method compensates for this shortcoming by introducing real-world knowledge, specifically by using a pre-trained stereo matching model to extract depth information instead of directly relying on the positions of Gaussian elements. This method includes the following steps: 1. **Scene Capture and Pose Estimation**: First, capture video or images of a static scene and use structure-from-motion algorithms like COLMAP to identify points of interest and infer camera matrices. 2. **3DGS and Novel Stereo View Rendering**: Represent the scene using the 3DGS model and render stereo image pairs aligned with the original image viewpoints, ensuring these image pairs are stereo-calibrated. 3. **Stereo Depth Estimation**: Input the generated stereo image pairs into a pre-trained stereo matching model (such as DLNR) to obtain depth maps for each image pair. To improve reconstruction quality, occlusion masks and depth range-based masks are also applied. 4. **Depth Fusion into Triangular Mesh Surface**: Finally, use the Truncated Signed Distance Function (TSDF) algorithm to aggregate all extracted depth information, generating a smooth and consistent triangular mesh surface. Through the above method, the researchers not only solve the problem of extracting accurate surfaces from 3DGS but also achieve a fast and efficient surface reconstruction process. Compared to existing neural reconstruction methods, this approach significantly reduces processing time. Additionally, experimental results show that this method performs well on multiple benchmark datasets, particularly on the DTU and Tanks and Temples datasets, demonstrating its advantages in geometric consistency and smoothness.

GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction

Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs

Surface Reconstruction from 3D Gaussian Splatting Via Local Structural Hints

SolidGS: Consolidating Gaussian Surfel Splatting for Sparse-View Surface Reconstruction

G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs

PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction

Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes

GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction

MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering

GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction

CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

3D Gaussian Splatting for Large-scale Surface Reconstruction from Aerial Images

GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction

GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting

3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting

2DGS-Room: Seed-Guided 2D Gaussian Splatting with Geometric Constrains for High-Fidelity Indoor Scene Reconstruction

MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification