Abstract:Notably, 6D pose estimation is a critical technology that enables robotics to perceive and interact with their operational environment. However, occlusion causes a loss of local features, which, in turn, restricts the estimation accuracy. To address these challenges, this paper proposes an end-to-end pose-estimation network based on a multi-channel attention mechanism, DA2Net. Firstly, a multi-channel attention mechanism, designated as "DA2Net", was devised using A2-Nets as its foundation. This mechanism is constructed in two steps. In the first step, the essential characteristics are extracted from the global feature space through the second-order attention pool. In the second step, a feature map is generated by the integration of position and channel attention. Subsequently, the extracted key features are assigned to each position of the feature map, enhancing both the feature representation capacity and the overall performance. Secondly, the designed attention mechanism is introduced into both the feature fusion and pose iterative refinement networks to enhance the network's capacity to acquire local features thus improving its overall performance. The experimental results demonstrated that the estimation accuracy of DenseFusion-DA2 on the LineMOD dataset was approximately 3.4% higher than that of DenseFusion. Furthermore, the estimation accuracy surpassed that of PoseCNN, PVNet, SSD6D, and PointFusion by 8.3%, 11.1%, 20.3%, and 23.8%, respectively. The estimation accuracy also shows a significant advantage on the Occluded LineMOD and HR-Vision datasets. This research not only presents a more efficient solution for robot perception but also introduces novel ideas and methods for technological advancements and applications in related fields.

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Recurrent Volume-based 3D Feature Fusion for Real-time Multi-view Object Pose Estimation

Recurrent Volume-Based 3-D Feature Fusion for Real-Time Multiview Object Pose Estimation.

FEIF: Feature Excitation and Interactive Fusion for 6D Object Pose Estimation.

3D Point-to-Keypoint Voting Network for 6D Pose Estimation

MixedFusion: 6D Object Pose Estimation from Decoupled RGB-Depth Features.

RFFCE: Residual Feature Fusion and Confidence Evaluation Network for 6dof Pose Estimation.

Estimating 6D Object Poses with Temporal Motion Reasoning for Robot Grasping in Cluttered Scenes

RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images

Towards Two-view 6D Object Pose Estimation: A Comparative Study on Fusion Strategy

A Transformer-based multi-modal fusion network for 6D pose estimation

Zero-Shot 3d Pose Estimation of Unseen Object by Two-Step Rgb-D Fusion

6D Object Pose Estimation in Cluttered Scenes from RGB Images

DenseFusion-DA2: End-to-End Pose-Estimation Network Based on RGB-D Sensors and Multi-Channel Attention Mechanisms

PA-Pose: Partial Point Cloud Fusion Based on Reliable Alignment for 6D Pose Tracking

A Lightweight Color and Geometry Feature Extraction and Fusion Module for End-to-end 6D Pose Estimation

6IMPOSE: bridging the reality gap in 6D pose estimation for robotic grasping

6-DoF grasp estimation method that fuses RGB-D data based on external attention

Robust Classification and 6D Pose Estimation by Sensor Dual Fusion of Image and Point Cloud Data

A modal fusion network with dual attention mechanism for 6D pose estimation