SPARF: Neural Radiance Fields from Sparse and Noisy Poses

Prune Truong,Marie-Julie Rakotosaona,Fabian Manhardt,Federico Tombari
2023-06-13
Abstract:Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve high - quality novel - view synthesis (NVS) given only a few wide - baseline input images (as few as 3) and noisy camera poses. Traditional Neural Radiance Fields (NeRF) methods perform excellently when dealing with dense input views and high - precision camera poses. However, in practical applications, especially in fields such as augmented reality/virtual reality (AR/VR) or autonomous driving, sparse input views are often obtained, and the camera poses of these views may have large errors. These problems limit the application of NeRF in real - scene applications. The paper proposes the Sparse Pose Adjusting Radiance Field (SPARF) method, which aims to jointly optimize the NeRF model and camera poses by using multi - view geometric constraints, so that high - quality new - view images can be generated even when there are only a few input images and the camera poses are inaccurate. Specifically, the SPARF method addresses the above challenges through the following points: 1. **Multi - view correspondence loss**: By extracting pixel correspondences from input views and using these correspondences to construct a multi - view correspondence loss, the optimized scene and camera poses are forced to converge to a global and geometrically accurate solution. 2. **Depth consistency loss**: By using the depth maps rendered from the training perspectives to create pseudo - depth supervision, the reconstructed scene is encouraged to be consistent in any perspective, thereby improving the rendering quality in new perspectives. 3. **Phased training framework**: The training process is divided into two stages. First, the camera pose estimation and the rough MLP network are jointly trained, and then the pose estimation is fixed, and only the rough and fine MLP networks are trained to ensure that the fine network can learn a clear geometric structure. Through these methods, SPARF can generate high - quality new - view images under sparse - view conditions, even when the number of input images is extremely small and the camera poses are noisy, significantly improving the usability of NeRF in practical applications.