Abstract:Are camera poses necessary for multi-view 3D modeling? Existing approaches predominantly assume access to accurate camera poses. While this assumption might hold for dense views, accurately estimating camera poses for sparse views is often elusive. Our analysis reveals that noisy estimated poses lead to degraded performance for existing sparse-view 3D modeling methods. To address this issue, we present LEAP, a novel pose-free approach, therefore challenging the prevailing notion that camera poses are indispensable. LEAP discards pose-based operations and learns geometric knowledge from data. LEAP is equipped with a neural volume, which is shared across scenes and is parameterized to encode geometry and texture priors. For each incoming scene, we update the neural volume by aggregating 2D image features in a feature-similarity-driven manner. The updated neural volume is decoded into the radiance field, enabling novel view synthesis from any viewpoint. On both object-centric and scene-level datasets, we show that LEAP significantly outperforms prior methods when they employ predicted poses from state-of-the-art pose estimators. Notably, LEAP performs on par with prior approaches that use ground-truth poses while running $400\times$ faster than PixelNeRF. We show LEAP generalizes to novel object categories and scenes, and learns knowledge closely resembles epipolar geometry. Project page: <a class="link-external link-https" href="https://hwjiang1510.github.io/LEAP/" rel="external noopener nofollow">this https URL</a>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is whether accurate camera pose information must be relied on in multi - view 3D modeling. Most of the existing methods assume that accurate camera poses can be obtained, but this assumption is often difficult to achieve in the case of sparse views, because accurately estimating camera poses under sparse views is very challenging. The paper points out that using inaccurate camera poses will lead to the performance degradation of existing sparse - view 3D modeling methods. Therefore, the paper proposes LEAP (Liberate Sparse - View 3D Modeling From Camera Poses), which is a brand - new pose - independent method, aiming to get rid of the dependence on camera poses, thus challenging the traditional view that camera poses are indispensable for 3D modeling. The main contributions of LEAP are as follows: 1. **Pose - independence**: LEAP abandons any operations that explicitly use camera poses, such as projection, etc., and instead learns pose - related geometric knowledge or representations from the data. 2. **Neural volume**: LEAP introduces a neural volume, which is shared among different scenes and parameterized to encode geometric and texture priors. For each input scene, the neural volume is updated through an aggregation method based on feature similarity. 3. **Fast inference**: LEAP can predict the radiance field in a single forward pass without an optimization process, which enables it to run in less than one second on a single consumer - level GPU. 4. **Strong generalization ability**: LEAP can accurately model objects of new categories, and the model trained on large object - centric datasets can be well transferred to the scene - level DTU dataset. In general, LEAP has successfully solved the key problem in 3D modeling under sparse views, that is, how to perform high - quality 3D modeling without accurate camera pose information, by proposing a new pose - independent paradigm. This method not only improves the performance of the model, but also significantly improves the inference speed and generalization ability.

LEAP: Liberate Sparse-view 3D Modeling from Camera Poses

LEAPSE: Learning Environment Affordances for 3D Human Pose and Shape Estimation

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

SparseLGS: Sparse View Language Embedded Gaussian Splatting

SPARF: Neural Radiance Fields from Sparse and Noisy Poses

MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering

A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis

ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models

SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

SRPose: Two-view Relative Pose Estimation with Sparse Keypoints

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization

Structure-aware neural radiance fields without posed camera

VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data

Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization

Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video

Lifting by Image -- Leveraging Image Cues for Accurate 3D Human Pose Estimation

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation