Abstract:Unseen object pose estimation methods often rely on CAD models or multiple reference views, making the onboarding stage costly. To simplify reference acquisition, we aim to estimate the unseen object's pose through a single unposed RGB-D reference image. While previous works leverage reference images as pose anchors to limit the range of relative pose, our scenario presents significant challenges since the relative transformation could vary across the entire SE(3) space. Moreover, factors like occlusion, sensor noise, and extreme geometry could result in low viewpoint overlap. To address these challenges, we present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation. Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation despite pose and size variations. To alleviate small overlap across viewpoints, we recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region. Evaluated on our proposed benchmark based on the BOP Challenge, UNOPose demonstrates superior performance, significantly outperforming traditional and learning-based methods in the one-reference setting and remaining competitive with CAD-model-based methods. The code and dataset will be available.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **Unseen Object Pose Estimation**, especially in the case of using only one RGB - D reference image without pose calibration. Traditional methods usually rely on CAD models or multiple reference views, which makes the introduction stage of new objects costly and complicated. To simplify the reference acquisition process, this paper proposes a new method and benchmark called **UNOPose**, aiming to estimate the pose of unseen objects through a single RGB - D reference image without pose calibration. ### Main Challenges 1. **Arbitrariness of Relative Pose**: In the absence of known pose anchor points, the relative pose can vary throughout the SE(3) space. 2. **Partial Matching Problem**: Due to factors such as occlusion, sensor noise and extreme geometries, the overlapping area between different viewpoints may be very small, resulting in a difficult matching process. 3. **Uncertainty of Absolute Pose**: For unseen objects, their absolute pose is not well - defined without an explicit normative framework, while relative pose estimation is not limited by this. ### Solutions To address these challenges, the authors propose the following innovations: - **SE(3)-Invariant Global Reference Frame (GRF)**: Eliminate the influence of pose and size changes by standardizing the object representation. - **Local Reference Frame (LRF) Encoding**: Capture fine - grained geometric details to improve matching accuracy. - **Overlap Predictor**: Identify and focus on processing overlapping areas to ensure reliable correspondences. - **Coarse - to - Fine Paradigm**: Gradually refine from a rough pose initialization and finally achieve high - precision pose estimation. ### Experimental Results Experiments show that UNOPose significantly outperforms traditional and learning - based methods in a single - reference - image setting and can be comparable to CAD - model - based methods in some cases. In addition, the authors also construct a new benchmark to promote future research. ### Summary The main contributions of this paper include: 1. Proposing the first method for 6DoF pose estimation of unseen objects using a single RGB - D reference image without pose calibration. 2. Constructing a new benchmark specifically for evaluating unseen object segmentation and pose estimation under a single reference image. 3. Introducing SE(3)-invariant global and local reference frames to achieve a standardized object representation and enhancing the network's generalization ability through the overlap predictor. Through these innovations, UNOPose improves the accuracy and robustness of pose estimation while reducing the cost of introducing new objects.

UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image

Pose Estimation and Neural Implicit Reconstruction Towards Non-Cooperative Spacecraft Without Offline Prior Information

Unseen Object Pose Estimation via Registration

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

Temporal Consistent Object Pose Estimation from Monocular Videos

OnePose: One-Shot Object Pose Estimation Without CAD Models

FoundPose: Unseen Object Pose Estimation with Foundation Features

Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

Efficient Encoding and Aligning Viewpoints for 6D Pose Estimation of Unseen Industrial Parts

A Method for Unseen Object Six Degrees of Freedom Pose Estimation Based on Segment Anything Model and Hybrid Distance Optimization

Leaping from 2D Detection to Efficient 6DoF Object Pose Estimation.

OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD Models

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

SEMPose: A Single End-to-end Network for Multi-object Pose Estimation

SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction

Occluded object 6D pose estimation using foreground probability compensation

RNNPose: 6-DoF Object Pose Estimation Via Recurrent Correspondence Field Estimation and Pose Optimization

SRPose: Two-view Relative Pose Estimation with Sparse Keypoints

NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation