Abstract:Category-level object pose estimation aims to predict the 6D pose and 3D metric size of objects from given categories. Due to significant intra-class shape variations among different instances, existing methods have mainly focused on estimating dense correspondences between observed point clouds and their canonical representations, i.e., normalized object coordinate space (NOCS). Subsequently, a similarity transformation is applied to recover the object pose and size. Despite these efforts, current approaches still cannot fully exploit the intrinsic geometric features to individual instances, thus limiting their ability to handle objects with complex structures (i.e., cameras). To overcome this issue, this paper introduces GPT-COPE, which leverages a graph-guided point transformer to explore distinctive geometric features from the observed point cloud. Specifically, our GPT-COPE employs a Graph-Guided Attention Encoder to extract multiscale geometric features in a local-to-global manner and utilizes an Iterative Non-Parametric Decoder to aggregate the multiscale geometric features from finer scales to coarser scales without learnable parameters. After obtaining the aggregated geometric features, the object NOCS coordinates and shape are regressed through the shape prior adaptation mechanism, and the object pose and size are obtained using the Umeyama algorithm. The multiscale network design enables perceiving the overall shape and structural information of the object, which is beneficial to handle objects with complex structures. Experimental results on the NOCS-REAL and NOCS-CAMERA datasets demonstrate that our GPT-COPE achieves state-of-the-art performance and significantly outperforms existing methods. Furthermore, our GPT-COPE shows superior generalization ability compared to existing methods on the large-scale in-the-wild dataset Wild6D and achieves better performance on the REDWOOD75 dataset, which involves objects with unconstrained orientations.

Category-Level Object Pose Estimation with Statistic Attention

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Attention Guided 6D Object Pose Estimation with Multi-constraints Voting Network

KGNet: Knowledge-Guided Networks for Category-Level 6D Object Pose and Size Estimation.

HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation

LA-Net: An End-to-End Category-Level Object Attitude Estimation Network Based on Multi-Scale Feature Fusion and an Attention Mechanism

SCAPE: A Simple and Strong Category-Agnostic Pose Estimator

Category Level Object Pose Estimation via Global High-Order Pooling

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence

PAM:Point-wise Attention Module for 6D Object Pose Estimation

Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation

Boosting Monocular 3D Human Pose Estimation with Part Aware Attention

GPT-COPE: A Graph-Guided Point Transformer for Category-Level Object Pose Estimation

SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size Estimation

Leveraging SE(3) Equivariance for Self-Supervised Category-Level Object Pose Estimation

Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Fine segmentation and difference-aware shape adjustment for category-level 6DoF object pose estimation

Densely Connected Attentional Pyramid Residual Network for Human Pose Estimation.

Simplified-attention Enhanced Graph Convolutional Network for 3D human pose estimation

SANet: A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation

PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features