NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods

Jonas Kulhanek,Torsten Sattler
2024-06-25
Abstract:Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and simulations for robotics. With the recent rapid development of Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) methods, it is becoming difficult to keep track of the current state of the art (SoTA) due to methods using different evaluation protocols, codebases being difficult to install and use, and methods not generalizing well to novel 3D scenes. Our experiments support this claim by showing that tiny differences in evaluation protocols of various methods can lead to inconsistent reported metrics. To address these issues, we propose a framework called NerfBaselines, which simplifies the installation of various methods, provides consistent benchmarking tools, and ensures reproducibility. We validate our implementation experimentally by reproducing numbers reported in the original papers. To further improve the accessibility, we release a web platform where commonly used methods are compared on standard benchmarks. Web: <a class="link-external link-https" href="https://jkulhanek.com/nerfbaselines" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several key issues in the evaluation and comparison of current novel view synthesis methods (such as NeRFs and 3D Gaussian Splatting): 1. **Inconsistent evaluation protocols**: Different studies use different evaluation protocols, making it difficult to directly compare the results. For example, different methods may use different image resolutions, different down - sampling parameters or different metric calculation parameters, and these differences may lead to significantly different evaluation results. 2. **Complexity of installation and use**: The codebases of many existing methods are difficult to install, and there are serious dependency conflicts, making it difficult for other researchers to reproduce the experimental results. In addition, the frequent updates of the codebases also increase the difficulty of reproduction. 3. **Insufficient generalization ability to new datasets**: Many methods perform well on the original datasets but poorly on new, unseen datasets. This is mainly because these methods make many simplified assumptions during implementation, such as assuming consistent camera intrinsics and the same image size. 4. **Lack of standardized interfaces and data processing procedures**: There are different interfaces, dataset formats and coordinate systems among different methods, which makes integration and custom - rendering complex. To address these problems, the paper proposes a framework named NerfBaselines, aiming to simplify benchmarking and improve reproducibility in the following ways: - **Unified interface**: Provide a unified API for NeRF and 3DGS methods, and standardize the dataset format and evaluation protocol. - **Environment isolation**: Each method is installed in an independent environment to manage dependencies and ensure reproducibility. - **Online benchmarking platform**: Release a website for comparing the performance of various methods on multiple datasets, and provide the functions of checkpoints and interactive viewing of results. - **Interactive camera trajectory editor**: Provide a web - based tool for designing and rendering custom camera trajectories, thereby more comprehensively evaluating multi - view consistency. Through these measures, the NerfBaselines framework not only simplifies the evaluation and comparison of novel view synthesis methods, but also improves the transparency and reproducibility of research.