Depth-supervised NeRF: Fewer Views and Faster Training for Free

Kangle Deng,Andrew Liu,Jun-Yan Zhu,Deva Ramanan
2024-10-18
Abstract:A commonly observed failure mode of Neural Radiance Field (NeRF) is fitting incorrect geometries when given an insufficient number of input views. One potential reason is that standard volumetric rendering does not enforce the constraint that most of a scene's geometry consist of empty space and opaque surfaces. We formalize the above assumption through DS-NeRF (Depth-supervised Neural Radiance Fields), a loss for learning radiance fields that takes advantage of readily-available depth supervision. We leverage the fact that current NeRF pipelines require images with known camera poses that are typically estimated by running structure-from-motion (SFM). Crucially, SFM also produces sparse 3D points that can be used as "free" depth supervision during training: we add a loss to encourage the distribution of a ray's terminating depth matches a given 3D keypoint, incorporating depth uncertainty. DS-NeRF can render better images given fewer training views while training 2-3x faster. Further, we show that our loss is compatible with other recently proposed NeRF methods, demonstrating that depth is a cheap and easily digestible supervisory signal. And finally, we find that DS-NeRF can support other types of depth supervision such as scanned depth sensors and RGB-D reconstruction outputs.
Computer Vision and Pattern Recognition,Graphics,Machine Learning
What problem does this paper attempt to address?
The problem this paper attempts to address is the tendency of NeRF (Neural Radiance Fields) to produce errors when fitting scene geometry with an insufficient number of input views. Specifically, the standard volumetric rendering method does not enforce the constraint that most scene geometry consists of empty space and opaque surfaces, leading NeRF to overfit when training data is insufficient, resulting in poor performance when generating new viewpoints. To overcome these issues, the authors propose DS-NeRF (Depth-supervised NeRF), which leverages depth information recovered from Structure-from-Motion (SFM) to supervise the learning process of NeRF. Specifically, they introduce a loss function that ensures the termination distribution of rays aligns with the surface priors given by each key point. This approach not only reduces overfitting but also accelerates the training process. ### Main Contributions: 1. **Depth Supervision**: By utilizing sparse 3D point clouds generated from SFM as "free" depth supervision signals, the geometric modeling capability of NeRF is improved. 2. **Training Acceleration**: DS-NeRF can achieve the same results as NeRF with 2-3 times fewer training iterations, especially notable when the number of input views is limited. 3. **Compatibility**: The proposed depth supervision loss can be combined with other recently proposed NeRF methods to further enhance performance. 4. **Flexibility**: In addition to depth information generated from SFM, DS-NeRF can also support other types of depth supervision, such as scanning depth sensors and RGB-D reconstruction outputs. ### Experimental Results: - **View Synthesis**: Experiments on multiple datasets (e.g., DTU, NeRF Real, Redwood-3dscan) show that DS-NeRF can generate better images with fewer input views and significantly improve depth error. - **Training Speed**: DS-NeRF has a clear advantage in training speed, particularly with fewer input views, achieving higher PSNR values in fewer iterations. In summary, this paper effectively addresses the overfitting issue of NeRF with insufficient input views by introducing depth supervision, significantly improving training efficiency and view synthesis quality.