Abstract:We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We formulate our per-ray optimization and matched ray coherence by focusing on camera rays passing through keypoints in the input images to elevate both the efficiency and accuracy of scene correspondences. Accumulated ray features along the feature volume provide a means to discount the coherence constraint amid erroneous ray matching. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in multi - view images, how to jointly optimize camera poses and neural fields to improve the quality of novel view synthesis (NVS) and 3D geometric reconstruction. Specifically, the paper introduces a new method - Camera RAY Matching (CRAYM), aiming to overcome the performance degradation problem caused by camera pose noise in existing methods. ### Problem Background In the field of multi - view 3D reconstruction, accurate camera poses are crucial for generating high - quality new views and 3D models. However, in practical applications, camera pose information may come from different devices (such as GPS or IMU), and this information may be noisy, thus affecting the reconstruction effect. Traditional multi - view stereo (MVS) methods and recent neural field methods (such as NeRF) all rely on accurate camera poses, but when the pose information is inaccurate, their performance will decline significantly. ### Solution The CRAYM method solves the above problems in the following ways: 1. **Camera Ray Matching**: Different from traditional pixel matching, CRAYM uses camera rays for matching. Each ray not only carries 2D pixel values but also contains 3D spatial information, which helps to impose explicit geometric constraints when optimizing camera poses. 2. **Feature Volume**: CRAYM parameterizes camera rays into a feature volume, which encodes both geometric and photometric information simultaneously. In this way, the constraints of ray matching can be directly transferred to the feature volume, thereby imposing physically meaningful constraints in the joint optimization process. 3. **Multi - view Consistency**: CRAYM utilizes multi - view consistency to improve the quality of geometric reconstruction and photometric rendering. Specifically, it ensures the consistency of color and geometry by matching rays between key points and provides context information through auxiliary rays to enhance the robustness of key rays. 4. **Loss Function Design**: CRAYM introduces two geometric losses (epipolar loss and point - alignment loss) to further promote the consistency of ray matching and improve the reconstruction quality. ### Summary By introducing the concepts of camera ray matching and feature volume, CRAYM effectively solves the impact of camera pose noise on multi - view 3D reconstruction, thus achieving better performance in novel view synthesis and 3D geometric reconstruction tasks. Especially when dealing with fine - grained details, CRAYM shows obvious advantages. ### Formula Representation To ensure the correctness and readability of the formulas, the following are some key formulas involved in the paper: - **Ray Equation**: \[ r(t)=r_{o}+t r_{d} \quad(t \geq 0) \] where \(r_{o}\) is the camera center and \(r_{d}\) is the normalized line - of - sight direction. - **Accumulated Ray Feature**: \[ f(r_{k})=\int_{0}^{\infty} T(p_{k}) \sigma(p_{k}) f^{\prime \prime}(p_{k}) d t \] where \(T(r_{k}(t))=\exp \left(-\int_{0}^{t} \sigma(s) d s\right)\) represents the accumulated transmittance along the key ray \(r_{k}\). - **Matching Ray Consistency Module**: \[ c(r_{k}) = w c(r'_{k})+(1 - w) c(r_{k}) \] where \(w\) is the matching credibility, calculated as the cosine distance between the accumulated features of matching rays. - **Total Loss Function**: \[ L=\lambda_{1} L_{p}+\lambda_{2} L_{s}+\lambda_{3} L_{e}+\lambda_{4} L_{a} \] where \(L_{p}\) is the photometric loss, \(L_{s}\) is the SSIM loss, \(L_{e}\) is the epipolar loss, and \(L_{a}\) is the point - alignment loss. Through these formulas, CRAYM can effectively optimize camera poses and neural fields, thereby achieving high quality.

CRAYM: Neural Field Optimization via Camera RAY Matching

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

Cross-Ray Neural Radiance Fields for Novel-view Synthesis from Unconstrained Image Collections

Structure-aware neural radiance fields without posed camera

CRF-Based Reconstruction from Narrow-Baseline Image Sequences.

CBARF: Cascaded Bundle-Adjusting Neural Radiance Fields from Imperfect Camera Poses

MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems

CeRF: Convolutional Neural Radiance Fields for New View Synthesis with Derivatives of Ray Modeling

VMRF: View Matching Neural Radiance Fields

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

Enhancing Neural Radiance Fields with Depth and Normal Completion Priors from Sparse Views

Neural Rays for Occlusion-aware Image-based Rendering

SN 2 eRF: A Framework for Neural Radiance Fields given Sparse and Noisy Poses

Collaborative neural radiance fields for novel view synthesis

Neural Projection Mapping Using Reflectance Fields

NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction

Learning Robust Multi-Scale Representation for Neural Radiance Fields from Unposed Images

CamP: Camera Preconditioning for Neural Radiance Fields

Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

CMC: Few-shot Novel View Synthesis via Cross-view Multiplane Consistency

Neural Observation Field Guided Hybrid Optimization of Camera Placement