Abstract:This paper addresses the challenge of 6DoF texture-less object pose estimation from a single RGB image. Many recent works have shown that two-stage deep learning approaches based on the fusion of 2D geometric intermediate representations achieve remarkable results. These methods implicitly explore the mapping from the 2D appearance domain to the 3D structure domain. However, due to the lack of 3D geometric constraints from depth maps, it is difficult to extract enough clues based on appearance features to master the geometric relation of projection from 3D viewpoints to 2D planes, and this estimation process is extremely sensitive to occlusion. We propose a novel network called MLFNet that lifts the feature space from 2D to 3D based on hybrid 3D geometric intermediate representations. For the first time, we propose the surface normals in the object coordinate system as an intermediate representation of pose; its violent change provides strong clues for the keypoints usually located at the abrupt change of object surface. Dense 3D surfaces can enhance the geometric consistency of multi-representation constraints and retain more information in occluded scenes. With the proposed multi-modality dual attention mechanism and the embedding of standard 3D shape knowledge, the 2D geometric representation learning process explicitly depends on the fusion of 2D appearance features and 3D geometric features. This standardized information fusion pattern among 2D intermediate representations, 3D intermediate representations, and CAD models prior significantly reduces the network learning space. The proposed method achieves competitive performance on the Linemod dataset and outperforms the state-of-the-art methods on the Occlusion Linemod and T-Less datasets, which demonstrates the feasibility of the pose multi-representation fusion technique. The project site is at https://github.com/JJJano/MLFNet.

Depth-Based Lightweight Feature Fusion Network for Category-Level 6D Pose Estimation

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

FEIF: Feature Excitation and Interactive Fusion for 6D Object Pose Estimation.

A modal fusion network with dual attention mechanism for 6D pose estimation

A Lightweight Color and Geometry Feature Extraction and Fusion Module for End-to-end 6D Pose Estimation

RFFCE: Residual Feature Fusion and Confidence Evaluation Network for 6dof Pose Estimation.

A Transformer-based multi-modal fusion network for 6D pose estimation

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features.

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image

A Geometry-Enhanced 6D Pose Estimation Network with Incomplete Shape Recovery for Industrial Parts

LHFF-Net: Local heterogeneous feature fusion network for 6DoF pose estimation

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

KGNet: Knowledge-Guided Networks for Category-Level 6D Object Pose and Size Estimation.

PANet: A Pixel-Level Attention Network for 6D Pose Estimation With Embedding Vector Features

Robust Classification and 6D Pose Estimation by Sensor Dual Fusion of Image and Point Cloud Data

MLFNet: Monocular lifting fusion network for 6DoF texture-less object pose estimation

[Correlation of clinical, hemodynamic and biological data at the acute stage of myocardial infarction].

HFF6D: Hierarchical Feature Fusion Network for Robust 6D Object Pose Tracking