Abstract:A commonly observed failure mode of Neural Radiance Field (NeRF) is fitting incorrect geometries when given an insufficient number of input views. One potential reason is that standard volumetric rendering does not enforce the constraint that most of a scene's geometry consist of empty space and opaque surfaces. We formalize the above assumption through DS-NeRF (Depth-supervised Neural Radiance Fields), a loss for learning radiance fields that takes advantage of readily-available depth supervision. We leverage the fact that current NeRF pipelines require images with known camera poses that are typically estimated by running structure-from-motion (SFM). Crucially, SFM also produces sparse 3D points that can be used as "free" depth supervision during training: we add a loss to encourage the distribution of a ray's terminating depth matches a given 3D keypoint, incorporating depth uncertainty. DS-NeRF can render better images given fewer training views while training 2-3x faster. Further, we show that our loss is compatible with other recently proposed NeRF methods, demonstrating that depth is a cheap and easily digestible supervisory signal. And finally, we find that DS-NeRF can support other types of depth supervision such as scanned depth sensors and RGB-D reconstruction outputs.

What problem does this paper attempt to address?

The problem this paper attempts to address is the tendency of NeRF (Neural Radiance Fields) to produce errors when fitting scene geometry with an insufficient number of input views. Specifically, the standard volumetric rendering method does not enforce the constraint that most scene geometry consists of empty space and opaque surfaces, leading NeRF to overfit when training data is insufficient, resulting in poor performance when generating new viewpoints. To overcome these issues, the authors propose DS-NeRF (Depth-supervised NeRF), which leverages depth information recovered from Structure-from-Motion (SFM) to supervise the learning process of NeRF. Specifically, they introduce a loss function that ensures the termination distribution of rays aligns with the surface priors given by each key point. This approach not only reduces overfitting but also accelerates the training process. ### Main Contributions: 1. **Depth Supervision**: By utilizing sparse 3D point clouds generated from SFM as "free" depth supervision signals, the geometric modeling capability of NeRF is improved. 2. **Training Acceleration**: DS-NeRF can achieve the same results as NeRF with 2-3 times fewer training iterations, especially notable when the number of input views is limited. 3. **Compatibility**: The proposed depth supervision loss can be combined with other recently proposed NeRF methods to further enhance performance. 4. **Flexibility**: In addition to depth information generated from SFM, DS-NeRF can also support other types of depth supervision, such as scanning depth sensors and RGB-D reconstruction outputs. ### Experimental Results: - **View Synthesis**: Experiments on multiple datasets (e.g., DTU, NeRF Real, Redwood-3dscan) show that DS-NeRF can generate better images with fewer input views and significantly improve depth error. - **Training Speed**: DS-NeRF has a clear advantage in training speed, particularly with fewer input views, achieving higher PSNR values in fewer iterations. In summary, this paper effectively addresses the overfitting issue of NeRF with insufficient input views by introducing depth supervision, significantly improving training efficiency and view synthesis quality.

Depth-supervised NeRF: Fewer Views and Faster Training for Free

Depth-guided NeRF Training via Earth Mover's Distance

Enhancing View Synthesis with Depth-Guided Neural Radiance Fields and Improved Depth Completion

DaRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation

Enhancing Neural Radiance Fields with Depth and Normal Completion Priors from Sparse Views

NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception.

Single-view Neural Radiance Fields with Depth Teacher

Simple-RF: Regularizing Sparse Input Radiance Fields with Simpler Solutions

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization

NeRFVS: Neural Radiance Fields for Free View Synthesis Via Geometry Scaffolds

Dense Depth Priors for Neural Radiance Fields from Sparse Input Views

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

DA4NeRF: Depth-aware augmentation technique for neural radiance fields

Depth assisted novel view synthesis using few images

HDPNERF: Hybrid Depth Priors for Neural Radiance Fields from Sparse Input Views

FDC-NeRF: Learning Pose-Free Neural Radiance Fields with Flow-Depth Consistency.

FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors

Dip-NeRF: Depth-Based Anti-Aliased Neural Radiance Fields

StructNeRF: Neural Radiance Fields for Indoor Scenes With Structural Hints

LiDeNeRF: Neural radiance field reconstruction with depth prior provided by LiDAR point cloud