Abstract:3D Object pose estimation is a critical task in many real-world applications, e.g., robotic manipulation and augmented reality. Most existing methods focus on estimating the object instances or categories which have been seen in the training phase. However, it is imperative to estimate the pose of unseen objects without re-training the network in real world. Therefore, we proposed a 3D pose estimation method for unseen objects without re-training. Specifically, given the CAD model of the unseen object, a set of template RGB-D images (RGB images and depth images) is rendered at different viewpoints. Then a feature embedding network, named PoseFusion, is designed to extract the scene feature. In this network, RGB-D images are utilized to extract the texture feature and geometric feature, respectively. Afterwards, a cross-modality alignment module is proposed to eliminate the noise in single modality. The aligned texture feature and aligned geometric feature are fused through a geometry guided fusion module. Thus, by PoseFusion, the template RGB-D images generated from the CAD model are abstracted into a set of template scene features, and the query scene features are also embedded from the captured RGB-D images from the unseen object. Finally, the query scene features are matched with the template scene features by calculating the masked local similarity. Then the identity and pose of unseen object are determined by the most similar template. Experiments on LINEMOD and T-LESS datasets demonstrate that our method outperforms other methods and generalizes better to unseen objects. Extensive ablation studies are performed to verify the effectiveness of the PoseFusion.

Cross-modal Attention and Geometric Contextual Aggregation Network for 6dof Object Pose Estimation

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

CMA: Cross-modal Attention for 6D Object Pose Estimation

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

A Lightweight Color and Geometry Feature Extraction and Fusion Module for End-to-end 6D Pose Estimation

A modal fusion network with dual attention mechanism for 6D pose estimation

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

A Transformer-based multi-modal fusion network for 6D pose estimation

RFFCE: Residual Feature Fusion and Confidence Evaluation Network for 6dof Pose Estimation.

FEIF: Feature Excitation and Interactive Fusion for 6D Object Pose Estimation.

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

6D Object Pose Estimation in Cluttered Scenes from RGB Images

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features.

Multiple geometry representations for 6D object pose estimation in occluded or truncated scenes

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

A Geometry-Enhanced 6D Pose Estimation Network with Incomplete Shape Recovery for Industrial Parts

PAM:Point-wise Attention Module for 6D Object Pose Estimation

PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion