Abstract:This paper addresses the challenge of 6DoF texture-less object pose estimation from a single RGB image. Many recent works have shown that two-stage deep learning approaches based on the fusion of 2D geometric intermediate representations achieve remarkable results. These methods implicitly explore the mapping from the 2D appearance domain to the 3D structure domain. However, due to the lack of 3D geometric constraints from depth maps, it is difficult to extract enough clues based on appearance features to master the geometric relation of projection from 3D viewpoints to 2D planes, and this estimation process is extremely sensitive to occlusion. We propose a novel network called MLFNet that lifts the feature space from 2D to 3D based on hybrid 3D geometric intermediate representations. For the first time, we propose the surface normals in the object coordinate system as an intermediate representation of pose; its violent change provides strong clues for the keypoints usually located at the abrupt change of object surface. Dense 3D surfaces can enhance the geometric consistency of multi-representation constraints and retain more information in occluded scenes. With the proposed multi-modality dual attention mechanism and the embedding of standard 3D shape knowledge, the 2D geometric representation learning process explicitly depends on the fusion of 2D appearance features and 3D geometric features. This standardized information fusion pattern among 2D intermediate representations, 3D intermediate representations, and CAD models prior significantly reduces the network learning space. The proposed method achieves competitive performance on the Linemod dataset and outperforms the state-of-the-art methods on the Occlusion Linemod and T-Less datasets, which demonstrates the feasibility of the pose multi-representation fusion technique. The project site is at https://github.com/JJJano/MLFNet.

TGF-Net: Sim2Real Transparent Object 6D Pose Estimation Based on Geometric Fusion

TransNet: Transparent Object Manipulation Through Category-Level Pose Estimation

Transformer Based Feature Pyramid Network for Transparent Objects Grasp

6DoF Pose Estimation of Transparent Object from a Single RGB-D Image

EBFA-6D: End-to-End Transparent Object 6D Pose Estimation Based on a Boundary Feature Augmented Mechanism

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

DFNet-Trans: An end-to-end multibranching network for depth estimation for transparent objects

KGNet: Knowledge-Guided Networks for Category-Level 6D Object Pose and Size Estimation.

A Geometry-Enhanced 6D Pose Estimation Network with Incomplete Shape Recovery for Industrial Parts

DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field

ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

MLFNet: Monocular lifting fusion network for 6DoF texture-less object pose estimation

ClearPose: Large-scale Transparent Object Dataset and Benchmark

TODE-Trans: Transparent Object Depth Estimation with Transformer

Visual-tactile Fusion for Transparent Object Grasping in Complex Backgrounds

TRansPose: Large-Scale Multispectral Dataset for Transparent Object

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

Transparent Object Depth Completion

Transparency-Aware Segmentation of Glass Objects to Train RGB-Based Pose Estimators

TransPose: 6D Object Pose Estimation with Geometry-Aware Transformer