Haofei Xu,Songyou Peng,Fangjinhua Wang,Hermann Blum,Daniel Barath,Andreas Geiger,Marc Pollefeys
Abstract:Gaussian splatting and single/multi-view depth estimation are typically studied in isolation. In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation and study their interactions. More specifically, we first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features, leading to high-quality feed-forward 3D Gaussian splatting reconstructions. We also show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models from large-scale unlabelled datasets. We validate the synergy between Gaussian splatting and depth estimation through extensive ablation and cross-task transfer experiments. Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets in terms of both depth estimation and novel view synthesis, demonstrating the mutual benefits of connecting both tasks. Our code, models, and video results are available at <a class="link-external link-https" href="https://haofeixu.github.io/depthsplat/" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve
This paper aims to address the isolated research issues between **Multi-View Depth Estimation** and **Gaussian Splatting**. Specifically, the authors propose a method named **DepthSplat**, which combines monocular depth features and multi-view feature matching information to improve the performance of both tasks.
#### Main Issues:
1. **Limitations of Multi-View Depth Estimation**:
- Existing multi-view depth estimation methods perform poorly when dealing with challenging situations such as occlusions, textureless regions, and reflective surfaces.
- These methods typically rely on the assumption of multi-view photometric consistency, which often does not hold in real-world scenarios.
2. **Limitations of Monocular Depth Estimation**:
- Although monocular depth estimation has made significant progress on diverse in-the-wild data, it suffers from scale ambiguity and insufficient multi-view consistency, limiting its application in downstream tasks such as 3D reconstruction and video depth estimation.
3. **Limitations of Gaussian Splatting**:
- Gaussian Splatting performs well with sparse views but still faces issues in challenging situations like textureless regions and reflective surfaces.
- Current Gaussian Splatting methods mainly rely on feature matching information, which can be unreliable in some cases.
### Solution:
The authors propose the **DepthSplat** method, which combines monocular depth features with multi-view feature matching information to improve the performance of multi-view depth estimation and Gaussian Splatting. Specific contributions include:
1. **Robust Multi-View Depth Model**:
- Enhancing the multi-view feature matching branch with pre-trained monocular depth features to improve the robustness of multi-view depth estimation.
- Providing more consistent results in difficult matching situations such as occlusions, textureless regions, and reflective surfaces.
2. **Improvement in Gaussian Splatting**:
- Reprojecting the predicted multi-view depth maps into 3D space as Gaussian centers and using a lightweight network to predict other Gaussian parameters, achieving high-quality novel view synthesis.
- The new Gaussian Splatting module is fully differentiable and can optimize all model components with only photometric supervision, offering a new approach for unsupervised pre-training on large-scale unlabeled datasets.
3. **Unsupervised Pre-Training**:
- Using the Gaussian Splatting task for unsupervised pre-training can generate high-quality depth models, which can be further fine-tuned on specific depth tasks, achieving better results than training from scratch.
### Experimental Results:
- **Depth Estimation**: DepthSplat achieves state-of-the-art performance on multiple metrics across datasets such as TartanAir, ScanNet, and KITTI.
- **Novel View Synthesis**: DepthSplat also performs excellently on the novel view synthesis task on datasets like RealEstate10K and DL3DV, demonstrating the mutual benefits of connecting Gaussian Splatting and depth estimation.
In summary, this paper successfully addresses the isolated research issues between multi-view depth estimation and Gaussian Splatting by proposing the DepthSplat method, significantly improving the performance of both tasks.