Abstract:The 6-D pose estimation is a critical work essential to achieve reliable robotic grasping. Currently, the prevalent method is reliant on keypoint correspondence. However, this approach hinges on the determination of object keypoint locations, alongside their detection and localization in real scenes. It also employs the random sample consensus (RANSAC)-based perspective-n-point (PnP) algorithm to solve the pose. Yet, it is nondifferentiable and incapable of backpropagation with loss during the training phase. Alternatively, the direct regression method, while speedy and differentiable, falls short in terms of pose estimation performance, and thus needs enhancement. In view of these gaps, we investigate PPM6D, a new method for 6-D object pose estimation based on regression and point pair matching. Our methodology begins with a proposed cross-fusion module, designed to achieve the fusion and complementation of RGB features and point cloud features. Subsequently, an attention module adjusts the features of the object's 3-D model. Finally, we design a point pair matching module for effective matching of points and characteristics, resulting in an integral matching and fusion. PPM6D is extensively trained and tested utilizing benchmark datasets like LINEMOD, occlusion LINEMOD (LINEMOD-occ), YCB-Video, and T-LESS dataset. Experimental results prove that PPM6D can outperform many keypoint-based pose estimation methods, given its relatively rapid speed, thereby offering novel regression-based pose estimation ideas. When applied to real-world scenarios of object pose estimation tasks and grasp tasks of an actual Baxter robot, PPM6D demonstrates superior performance as compared to most alternatives.

Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Temporal Consistent Object Pose Estimation from Monocular Videos

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion

FEIF: Feature Excitation and Interactive Fusion for 6D Object Pose Estimation.

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

RFFCE: Residual Feature Fusion and Confidence Evaluation Network for 6dof Pose Estimation.

6-DoF grasp estimation method that fuses RGB-D data based on external attention

PoseFusion: Robust Object-in-Hand Pose Estimation with SelectLSTM

6D Pose Estimation with Combined Deep Learning and 3D Vision Techniques for a Fast and Accurate Object Grasping

A Transformer-based multi-modal fusion network for 6D pose estimation

Deep instance segmentation and 6D object pose estimation in cluttered scenes for robotic autonomous grasping

HFF6D: Hierarchical Feature Fusion Network for Robust 6D Object Pose Tracking

6-D Object Pose Estimation Based on Point Pair Matching for Robotic Grasp Detection

6D Hybrid Pose Estimation in Cluttered Industrial Scenes for Robotic Grasping

Single-Camera Multi-View 6DoF pose estimation for robotic grasping

6IMPOSE: bridging the reality gap in 6D pose estimation for robotic grasping

Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning