A robot grasping detection network based on flexible selection of multi-modal feature fusion structure
Yuhan Wang,Zhibo Guo,Yu Chen,Chaiqi Guo,Meizhen Xia,Tingyue Qi
DOI: https://doi.org/10.1007/s10489-024-05427-9
IF: 5.3
2024-04-14
Applied Intelligence
Abstract:In unstructured scenarios, objects usually have unique shapes, poses, and other uncertainties, which put forward higher requirements for the robot's planar grasping detection ability. Most previous methods use single-modal data or simply fused multi-modal data to predict gripper configurations. Single-modal data is not conducive to comprehensively describe the diversity of objects, and the simple fusion method may also ignore the dependencies between multi-modal data. Based on the above considerations, we propose a Multi-modal Dynamic Cooperative Fusion Network (MDCNet), in which a Multilevel Semantic Guided Fusion Module (MSG) is designed, through which enhanced semantic guidance vectors are used to suppress the undesired influence factors produced by different fusion structures. In addition, we also design a general Enhanced Feature Pyramid Nets Structure (EFPN) to learn the dependencies between fine-grained features and coarse-grained features and improve the robustness of the encoder in unstructured scenarios. The results show that the proposed method has an accuracy rate of 98.9% on the Jacquard dataset and 99.6% on the Cornell dataset. In over 2000 robotic grasp trials, our structure achieves a grasp success rate of 98.8% in single-object scenarios and 93.5% in cluttered scenarios. The proposed method in this paper is superior to previous grasp detection methods in both speed and accuracy, and has strong real-time performance.
computer science, artificial intelligence