Abstract:3D Object pose estimation is a critical task in many real-world applications, e.g., robotic manipulation and augmented reality. Most existing methods focus on estimating the object instances or categories which have been seen in the training phase. However, it is imperative to estimate the pose of unseen objects without re-training the network in real world. Therefore, we proposed a 3D pose estimation method for unseen objects without re-training. Specifically, given the CAD model of the unseen object, a set of template RGB-D images (RGB images and depth images) is rendered at different viewpoints. Then a feature embedding network, named PoseFusion, is designed to extract the scene feature. In this network, RGB-D images are utilized to extract the texture feature and geometric feature, respectively. Afterwards, a cross-modality alignment module is proposed to eliminate the noise in single modality. The aligned texture feature and aligned geometric feature are fused through a geometry guided fusion module. Thus, by PoseFusion, the template RGB-D images generated from the CAD model are abstracted into a set of template scene features, and the query scene features are also embedded from the captured RGB-D images from the unseen object. Finally, the query scene features are matched with the template scene features by calculating the masked local similarity. Then the identity and pose of unseen object are determined by the most similar template. Experiments on LINEMOD and T-LESS datasets demonstrate that our method outperforms other methods and generalizes better to unseen objects. Extensive ablation studies are performed to verify the effectiveness of the PoseFusion.

6D Object Pose Estimation with Location-and-Channel Attention

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

An Iterative Attention Fusion Network for 6D Object Pose Estimation

CMA: Cross-modal Attention for 6D Object Pose Estimation

ACF-Net: Attention Context Fusion Network for 6D Pose Estimation

A modal fusion network with dual attention mechanism for 6D pose estimation

Object 6D Pose Estimation with Non-local Attention

Cross-modal Attention and Geometric Contextual Aggregation Network for 6dof Object Pose Estimation

6D Object Pose Estimation Based on Cross-Modality Feature Fusion

A Novel Depth and Color Feature Fusion Framework for 6D Object Pose Estimation.

DGECN++: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation via Attention Mechanism

A Novel 6D Pose Estimation Method for Indoor Objects Based on Monocular Regression Depth

FEIF: Feature Excitation and Interactive Fusion for 6D Object Pose Estimation.

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

An Efficient Color and Geometric Feature Fusion Module for 6D Object Pose Estiamtion

Efficient 6D Object Pose Estimation Based on Attentive Multi‐scale Contextual Information

Homologous multimodal fusion network with geometric constraint keypoints selection for 6D pose estimation

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Semi-Decoupled 6D Pose Estimation Via Multi-Modal Feature Fusion

A Transformer-based multi-modal fusion network for 6D pose estimation