Abstract:Neural Radiance Fields (NeRF) have demonstrated impressive potential in synthesizing novel views from dense input, however, their effectiveness is challenged when dealing with sparse input. Existing approaches that incorporate additional depth or semantic supervision can alleviate this issue to an extent. However, the process of supervision collection is not only costly but also potentially inaccurate, leading to poor performance and generalization ability in diverse scenarios. In our work, we introduce a novel model: the Collaborative Neural Radiance Fields (ColNeRF) designed to work with sparse input. The collaboration in ColNeRF includes both the cooperation between sparse input images and the cooperation between the output of the neural radiation field. Through this, we construct a novel collaborative module that aligns information from various views and meanwhile imposes self-supervised constraints to ensure multi-view consistency in both geometry and appearance. A Collaborative Cross-View Volume Integration module (CCVI) is proposed to capture complex occlusions and implicitly infer the spatial location of objects. Moreover, we introduce self-supervision of target rays projected in multiple directions to ensure geometric and color consistency in adjacent regions. Benefiting from the collaboration at the input and output ends, ColNeRF is capable of capturing richer and more generalized scene representation, thereby facilitating higher-quality results of the novel view synthesis. Extensive experiments demonstrate that ColNeRF outperforms state-of-the-art sparse input generalizable NeRF methods. Furthermore, our approach exhibits superiority in fine-tuning towards adapting to new scenes, achieving competitive performance compared to per-scene optimized NeRF-based methods while significantly reducing computational costs. Our code is available at: <a class="link-external link-https" href="https://github.com/eezkni/ColNeRF" rel="external noopener nofollow">this https URL</a>.

FeatureNeRF: Learning Generalizable NeRFs by Distilling Foundation Models

MPS-NeRF: Generalizable 3D Human Rendering from Multiview Images

DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features

NeRF-Loc: Visual Localization with Conditional Neural Radiance Field.

OV-NeRF: Open-vocabulary Neural Radiance Fields with Vision and Language Foundation Models for 3D Semantic Understanding

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields

ID-NeRF: Indirect Diffusion-guided Neural Radiance Fields for Generalizable View Synthesis

M^2DNeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

InsertNeRF: Instilling Generalizability into NeRF with HyperNet Modules

RD-NERF: Neural Robust Distilled Feature Fields for Sparse-View Scene Segmentation

GenPower-NeRF: A Neural Radiance Field Method with Powerful Generalization

GeoNeRF: Generalizing NeRF with Geometry Priors

GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding

ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

TransNeRF: Multi-View Optimization for General Neural Radiance Fields Across Scenes

MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Pose

Interactive Segment Anything NeRF with Feature Imitation