Abstract:6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key challenges in 6D object pose estimation and pose tracking tasks, especially in the field of computer vision. Specifically, the paper mainly addresses the following issues: 1. **Lack of large - scale datasets**: - Current research on 6D object pose estimation is limited by the lack of large - scale datasets, which hinders the comprehensive evaluation of model performance and thus restricts research progress. - Existing datasets are limited in terms of the number of instances or category diversity, resulting in a restricted application range. 2. **Insufficient diversity and complexity of datasets**: - Existing datasets lack diversity in object categories, materials, and scene variations, and cannot fully reflect the complex situations in the real world. 3. **Limitations of model performance evaluation**: - There is a lack of a comprehensive benchmarking platform to evaluate the performance of existing methods in 6D object pose estimation and pose tracking tasks. To solve these problems, the paper introduces a new dataset and a model framework: - **Omni6DPose dataset**: This is a 6D object pose estimation dataset with a large amount of category, instance, and material diversity, which is divided into three main parts: - **ROPE (Real 6D Object Pose Estimation Dataset)**: It contains 332,000 images, with more than 1.5 million annotations, covering 581 instances and 149 categories. - **SOPE (Simulated 6D Object Pose Estimation Dataset)**: It contains 475,000 mixed - reality - generated images, with more than 5 million annotations, covering 4,162 instances and the same 149 categories. - Manually aligned real - scanned objects for ROPE and SOPE. - **GenPose++ model**: This is an improved SOTA category - level pose estimation framework, with two key improvements: - **Semantic - aware feature extraction**: It combines RGB images and point cloud information to improve the accuracy of feature extraction. - **Clustering - based aggregation**: It uses a clustering algorithm to handle multimodal distributions and solve the pose estimation problem of non - continuous symmetric objects. Through these improvements, the paper not only provides a richer and more challenging dataset but also proposes a more powerful model framework to promote the further development of the 6D object pose estimation and pose tracking fields.

Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

ROV6D: 6D Pose Estimation Benchmark Dataset for Underwater Remotely Operated Vehicles

BOP: Benchmark for 6D Object Pose Estimation

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics

A Survey of 6DoF Object Pose Estimation Methods for Different Application Scenarios

GeoPose: Dense Reconstruction Guided 6D Object Pose Estimation with Geometric Consistency

Open-vocabulary object 6D pose estimation

6IMPOSE: bridging the reality gap in 6D pose estimation for robotic grasping

Real-Time and Efficient 6-D Pose Estimation from a Single RGB Image

POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference

SS-Pose: Self-Supervised 6-D Object Pose Representation Learning Without Rendering

OnePose: One-Shot Object Pose Estimation Without CAD Models

BOP-Distrib: Revisiting 6D Pose Estimation Benchmark for Better Evaluation under Visual Ambiguities

For A More Comprehensive Evaluation of 6dof Object Pose Tracking

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

A 6DoF Pose Estimation Dataset and Network for Multiple Parametric Shapes in Stacked Scenarios

Fine segmentation and difference-aware shape adjustment for category-level 6DoF object pose estimation