Abstract:Synthesizing accurate geometry and photo-realistic appearance of small scenes is an active area of research with compelling use cases in gaming, virtual reality, robotic-manipulation, autonomous driving, convenient product capture, and consumer-level photography. When applying scene geometry and appearance estimation techniques to robotics, we found that the narrow cone of possible viewpoints due to the limited range of robot motion and scene clutter caused current estimation techniques to produce poor quality estimates or even fail. On the other hand, in robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled. Depth can provide a good initial estimate of the object geometry to improve reconstruction, while multi-illumination images can facilitate relighting. In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations and address an artifact observed while jointly refining geometry and appearance by disambiguating between texture and geometry edges. We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis with a few training views.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to improve the quality of geometric and appearance reconstruction in small - scale scenes by integrating dense metric depth into neural 3D representations, especially in terms of view synthesis and relighting. Specifically, the authors found that in robotic applications, due to the limited range of robot motion and cluttered scenes, current geometric and appearance estimation techniques are unable to produce high - quality results or even fail. Therefore, they proposed a new method that utilizes dense metric depth and multi - view, multi - illumination images to optimize the neural 3D scene understanding pipeline. ### Core Problems of the Paper 1. **Accurate Reconstruction of Geometry and Appearance**: In small - scale scenes, especially in application scenarios such as robotic manipulation, autonomous driving, convenient product capture, and consumer - level photography, accurate geometry and realistic appearance reconstruction are required. 2. **View Synthesis and Relighting**: How to achieve high - quality view synthesis and relighting with a small number of training views. 3. **Distinguishing between Texture and Geometric Edges**: When jointly optimizing geometry and appearance, existing methods have difficulty distinguishing between texture and geometric edges, resulting in artifacts in the reconstruction results. ### Solutions - **Introducing Dense Metric Depth**: Utilize the dense metric depth information captured by the stereo camera system as an additional supervision signal to improve geometric reconstruction. - **Multi - Flash Stereo Camera System**: Developed a multi - flash stereo camera system based on off - the - shelf components, which is capable of capturing data under multi - view and multi - illumination conditions. - **Utilization of Depth Edges**: By introducing depth edges as a supervision signal, the problem of artifacts in the joint optimization of geometry and appearance is solved. ### Experimental Verification The paper verified the effectiveness of the proposed method through a series of experiments, including: - **Geometric Reconstruction Accuracy**: Tests were carried out using synthetic and real - world datasets, proving that the method can still achieve reconstruction accuracy comparable to or better than existing methods with reduced training data and gradient steps. - **View Synthesis and Relighting**: Demonstrated a significant improvement in the quality of view synthesis and relighting with a small number of training views. - **Impact of Depth Edges**: Through comparative experiments, the importance of depth edges in training was verified, especially when dealing with complex geometric structures. In general, this paper aims to improve the performance of neural 3D representations in the geometric and appearance reconstruction of small - scale scenes by introducing dense metric depth and other auxiliary information, especially in response to the challenges in robotic applications.

Incorporating dense metric depth into neural 3D representations for view synthesis and relighting

DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis

Depth Generation Network: Estimating Real World Depth From Stereo And Depth Images

A Learning-Based Method Using Epipolar Geometry for Light Field Depth Estimation

Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images

Depth Reconstruction with Neural Signed Distance Fields in Structured Light Systems

Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images with Virtual Depth

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

Sparse Views, Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo

Depth assisted novel view synthesis using few images

DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points

Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis

DeLiRa: Self-Supervised Depth, Light, and Radiance Fields

Enhancing View Synthesis with Depth-Guided Neural Radiance Fields and Improved Depth Completion

Single-Shot Metric Depth from Focused Plenoptic Cameras

SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates

Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

Distilled Visual and Robot Kinematics Embeddings for Metric Depth Estimation in Monocular Scene Reconstruction

Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination