GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views

Yaniv Wolf,Amit Bracha,Ron Kimmel
2024-07-17
Abstract:Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of accurately reconstructing scene surfaces from the 3D Gaussian Splatting (3DGS) model. Although 3DGS performs excellently in novel view synthesis, directly extracting geometric information from optimized Gaussian attributes remains challenging because these attributes are primarily optimized based on photometric loss, resulting in reconstructed surfaces that are noticeably noisy and unrealistic. The proposed method compensates for this shortcoming by introducing real-world knowledge, specifically by using a pre-trained stereo matching model to extract depth information instead of directly relying on the positions of Gaussian elements. This method includes the following steps: 1. **Scene Capture and Pose Estimation**: First, capture video or images of a static scene and use structure-from-motion algorithms like COLMAP to identify points of interest and infer camera matrices. 2. **3DGS and Novel Stereo View Rendering**: Represent the scene using the 3DGS model and render stereo image pairs aligned with the original image viewpoints, ensuring these image pairs are stereo-calibrated. 3. **Stereo Depth Estimation**: Input the generated stereo image pairs into a pre-trained stereo matching model (such as DLNR) to obtain depth maps for each image pair. To improve reconstruction quality, occlusion masks and depth range-based masks are also applied. 4. **Depth Fusion into Triangular Mesh Surface**: Finally, use the Truncated Signed Distance Function (TSDF) algorithm to aggregate all extracted depth information, generating a smooth and consistent triangular mesh surface. Through the above method, the researchers not only solve the problem of extracting accurate surfaces from 3DGS but also achieve a fast and efficient surface reconstruction process. Compared to existing neural reconstruction methods, this approach significantly reduces processing time. Additionally, experimental results show that this method performs well on multiple benchmark datasets, particularly on the DTU and Tanks and Temples datasets, demonstrating its advantages in geometric consistency and smoothness.