Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

Jiyao Zhang,Weiyao Huang,Bo Peng,Mingdong Wu,Fei Hu,Zijian Chen,Bo Zhao,Hao Dong
2024-06-07
Abstract:6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. Omni6DPose is divided into three main components: ROPE (Real 6D Object Pose Estimation Dataset), which includes 332K images annotated with over 1.5M annotations across 581 instances in 149 categories; SOPE(Simulated 6D Object Pose Estimation Dataset), consisting of 475K images created in a mixed reality setting with depth simulation, annotated with over 5M annotations across 4162 instances in the same 149 categories; and the manually aligned real scanned objects used in both ROPE and SOPE. Omni6DPose is inherently challenging due to the substantial variations and ambiguities. To address this challenge, we introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements: Semantic-aware feature extraction and Clustering-based aggregation. Moreover, we provide a comprehensive benchmarking analysis to evaluate the performance of previous methods on this large-scale dataset in the realms of 6D object pose estimation and pose tracking.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in 6D object pose estimation and pose tracking tasks, especially in the field of computer vision. Specifically, the paper mainly addresses the following issues: 1. **Lack of large - scale datasets**: - Current research on 6D object pose estimation is limited by the lack of large - scale datasets, which hinders the comprehensive evaluation of model performance and thus restricts research progress. - Existing datasets are limited in terms of the number of instances or category diversity, resulting in a restricted application range. 2. **Insufficient diversity and complexity of datasets**: - Existing datasets lack diversity in object categories, materials, and scene variations, and cannot fully reflect the complex situations in the real world. 3. **Limitations of model performance evaluation**: - There is a lack of a comprehensive benchmarking platform to evaluate the performance of existing methods in 6D object pose estimation and pose tracking tasks. To solve these problems, the paper introduces a new dataset and a model framework: - **Omni6DPose dataset**: This is a 6D object pose estimation dataset with a large amount of category, instance, and material diversity, which is divided into three main parts: - **ROPE (Real 6D Object Pose Estimation Dataset)**: It contains 332,000 images, with more than 1.5 million annotations, covering 581 instances and 149 categories. - **SOPE (Simulated 6D Object Pose Estimation Dataset)**: It contains 475,000 mixed - reality - generated images, with more than 5 million annotations, covering 4,162 instances and the same 149 categories. - Manually aligned real - scanned objects for ROPE and SOPE. - **GenPose++ model**: This is an improved SOTA category - level pose estimation framework, with two key improvements: - **Semantic - aware feature extraction**: It combines RGB images and point cloud information to improve the accuracy of feature extraction. - **Clustering - based aggregation**: It uses a clustering algorithm to handle multimodal distributions and solve the pose estimation problem of non - continuous symmetric objects. Through these improvements, the paper not only provides a richer and more challenging dataset but also proposes a more powerful model framework to promote the further development of the 6D object pose estimation and pose tracking fields.