Abstract:The capabilities of monocular depth estimation (MDE) models are limited by the availability of sufficient and diverse datasets. In the case of MDE models for autonomous driving, this issue is exacerbated by the linearity of the captured data trajectories. We propose a NeRF-based data augmentation pipeline to introduce synthetic data with more diverse viewing directions into training datasets and demonstrate the benefits of our approach to model performance and robustness. Our data augmentation pipeline, which we call \textit{NeRFmentation}, trains NeRFs on each scene in a dataset, filters out subpar NeRFs based on relevant metrics, and uses them to generate synthetic RGB-D images captured from new viewing directions. In this work, we apply our technique in conjunction with three state-of-the-art MDE architectures on the popular autonomous driving dataset, KITTI, augmenting its training set of the Eigen split. We evaluate the resulting performance gain on the original test set, a separate popular driving dataset, and our own synthetic test set.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address the limitations of Monocular Depth Estimation (MDE) models in terms of performance due to insufficient dataset diversity and limited viewpoint variations. Specifically, the authors focus on: 1. **Dataset Limitations**: Existing MDE models typically rely on large-scale datasets for training, but these datasets often lack sufficient diversity and viewpoint variations. This is particularly true in autonomous driving scenarios where data trajectories are usually linear, leading to limited generalization capabilities of the models. 2. **Improving Generalization**: To enhance the generalization ability of MDE models in unseen environments, the authors propose a data augmentation method based on NeRF (Neural Radiance Fields), called NeRFmentation. This method augments existing training datasets by generating synthetic RGB-D images from new viewpoints, thereby improving the robustness and generalization ability of the models. ### Main Contributions 1. **NeRFmentation Data Augmentation Pipeline**: A simple yet effective data augmentation pipeline is proposed, utilizing NeRF to generate high-quality RGB-D images. This extends and diversifies existing real-world depth estimation datasets from new viewpoints to increase the robustness of the training models. 2. **Extensive Experimental Validation**: Various state-of-the-art MDE models trained on the KITTI dataset were evaluated in a zero-shot cross-dataset manner, particularly on the Waymo Open Dataset, demonstrating the effectiveness of NeRFmentation. 3. **Comparison with Other Data Augmentation Techniques**: Experimental results show that NeRFmentation outperforms other data augmentation techniques in both in-distribution and out-of-distribution testing tasks. Insights are provided on when to combine classical training-time augmentation techniques (such as random rotation, flipping, and cropping) with data augmentation. ### Method Overview 1. **Dataset Structure Analysis**: Many real-world image datasets have a sequential structure, capturing the same environment from multiple slightly different viewpoints. Using the provided pose information, a 3D model of the scene can be constructed, generating new RGB-D pairs to increase the size of the original dataset. 2. **NeRF Training and Filtering**: Each scene is divided into sub-scenes, and individual depth-supervised NeRFs are trained. After training, a small portion of the data is used as a validation set to filter out low-quality NeRFs. 3. **New Viewpoint Synthesis**: New viewpoints are generated by applying slight 3D rigid transformations to the original poses. The trained NeRFs are then used to render dense RGB-D images from these new viewpoints. These images are combined with the source dataset to form an augmented dataset for training downstream monocular depth estimation networks. ### Experimental Results 1. **Datasets**: The experiments primarily used the KITTI and Waymo Open Dataset. 2. **Data Augmentation Baseline Methods**: Comparisons were made with methods such as Fourier Domain Analysis (FDA) and style transfer. 3. **New Viewpoint Synthesis Strategies**: Five data augmentation strategies were proposed, including re-rendering training poses, interpolating new poses, horizontal and vertical translations, etc. The impact of depth completion and pose diversification was analyzed. 4. **Model Training and Evaluation**: Various MDE architectures (such as AdaBins, DepthFormer, and BinsFormer) were trained on the augmented KITTI dataset and evaluated on the Waymo and KITTI test sets, demonstrating the significant effect of NeRFmentation in improving model performance and generalization ability. ### Conclusion NeRFmentation effectively enhances the robustness and generalization ability of MDE models by generating synthetic data from new viewpoints, particularly in unseen environments. This method not only performs well in in-distribution testing tasks but also significantly outperforms other data augmentation techniques in out-of-distribution testing tasks.

NeRFmentation: NeRF-based Augmentation for Monocular Depth Estimation

A Robust Monocular Depth Estimation Framework Based on Light-Weight ERF-Pspnet for Day-Night Driving Scenes

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

DA4NeRF: Depth-aware augmentation technique for neural radiance fields

DaRF: Boosting Radiance Fields from Sparse Inputs with Monocular Depth Adaptation

Synthetic Data Enhancement and Network Compression Technology of Monocular Depth Estimation for Real-Time Autonomous Driving System

NeRF-Aug: Data Augmentation for Robotics with Neural Radiance Fields

3D Data Augmentation for Driving Scenes on Camera

NeuRAD: Neural Rendering for Autonomous Driving

Aug-NeRF: Training Stronger Neural Radiance Fields with Triple-Level Physically-Grounded Augmentations

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

AltNeRF: Learning Robust Neural Radiance Field via Alternating Depth-Pose Optimization

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

Exploring augmentation strategies in mixed reality for autonomous driving with depth cameras

Depth-supervised NeRF: Fewer Views and Faster Training for Free

Neural Radiance Fields for Fisheye Driving Scenes Using Edge-Aware Integrated Depth Supervision

MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications

HarmonicNeRF: Geometry-Informed Synthetic View Augmentation for 3D Scene Reconstruction in Driving Scenarios

Depth-guided NeRF Training via Earth Mover's Distance

S-NeRF: Neural Radiance Fields for Street Views

UC-NeRF: Neural Radiance Field for Under-Calibrated Multi-view Cameras in Autonomous Driving