Few-View Object Reconstruction with Unknown Categories and Camera Poses

Hanwen Jiang,Zhenyu Jiang,Kristen Grauman,Yuke Zhu

2024-01-26

Abstract:While object reconstruction has made great strides in recent years, current methods typically require densely captured images and/or known camera poses, and generalize poorly to novel object categories. To step toward object reconstruction in the wild, this work explores reconstructing general real-world objects from a few images without known camera poses or object categories. The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation -- in a unified approach. Our approach captures the synergies of these two problems: reliable camera pose estimation gives rise to accurate shape reconstruction, and the accurate reconstruction, in turn, induces robust correspondence between different views and facilitates pose estimation. Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence for estimating relative camera poses. The 3D features are then transformed by the estimated poses into a shared space and are fused into a neural radiance field. The reconstruction results are rendered by volume rendering techniques, enabling us to train the model without 3D shape ground-truth. Our experiments show that FORGE reliably reconstructs objects from five views. Our pose estimation method outperforms existing ones by a large margin. The reconstruction results under predicted poses are comparable to the ones using ground-truth poses. The performance on novel testing categories matches the results on categories seen during training. Project page: <a class="link-external link-https" href="https://ut-austin-rpl.github.io/FORGE/" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the problem of reconstructing real-world objects from a few views in the absence of known categories and camera poses. Specifically: 1. **Object Reconstruction from Few Views**: - Current methods typically require densely captured images and known camera poses, and they generalize poorly to new object categories. - This paper proposes a method named FORGE (Few-view Object Reconstruction that GEneralizes), which can reconstruct objects from a few input views without relying on object category or camera pose information. 2. **Relative Camera Pose Estimation**: - Existing camera pose estimation methods mainly rely on the correlation of 2D images, but with a few views, the significant changes in camera poses hinder the establishment of 2D correlations. - This paper designs a novel relative pose estimator that uses 3D features and 2D images as input, eliminating re-projection ambiguity through the correlation between 3D features. 3. **Synergy between Shape Reconstruction and Pose Estimation**: - FORGE leverages the synergy between shape reconstruction and pose estimation to improve the performance of both. - The model is first trained with real camera poses to learn 3D geometric priors; then, the relative camera pose estimator is trained to accurately construct 3D correlations in the established view-consistent 3D feature space. ### Main Contributions 1. Proposes a method that can jointly estimate the camera poses of input images and reconstruct objects, generalizing to new object categories. 2. Introduces a novel camera pose estimation method that handles significant camera changes with few input views. 3. Demonstrates the advantages of the synergy between reconstruction and pose estimation through experiments, significantly improving the quality of both tasks.

Few-View Object Reconstruction with Unknown Categories and Camera Poses

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Pose Estimation and Neural Implicit Reconstruction Towards Non-Cooperative Spacecraft Without Offline Prior Information

In-Hand 3D Object Reconstruction from a Monocular RGB Video

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Visual Odometry Based 3D-Reconstruction

Enhanced 3D Shape Reconstruction With Knowledge Graph of Category Concept

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

Active Object Reconstruction Using a Guided View Planner

Generic Objects as Pose Probes for Few-Shot View Synthesis

StrobeNet: Category-Level Multiview Reconstruction of Articulated Objects

FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Single-view 3D Mesh Reconstruction for Seen and Unseen Categories

Data-Driven 3D Reconstruction of Dressed Humans From Sparse Views

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera

AutoRecon: Automated 3D Object Discovery and Reconstruction.

A Pose-only Solution to Visual Reconstruction and Navigation