Abstract:Grasping in stacked scenarios is an indispensable capability for intelligent robots. However, in the context of multi-object stacking or occluded scenes, existing algorithms for direct target object grasping result in a high failure rate, while methods for scene clearing grasping lead to inefficiency. Hence, executing grasping operations for target object in a logical sequence is imperative. To address this challenge, we propose an end-to-end grasping model based on a Gated Self Attention Network(GSAN), designed to guide robots to perform optimal sequential grasping of target objects within dense and cluttered scenes. We integrate object detection, grasp detection, and stacking relationship reasoning into a single deep neural network. Specifically, the object detection and grasp detection networks extract features from input RGB images and estimate object categories, bounding boxes and grasp poses. The GSAN captures the non-Euclidean information between object features in high-dimensional space, enhancing the accuracy of triplet relationship reasoning through gated self attention and positional encoding. Our algorithm achieves the best results in the Visual Manipulation Relationship Dataset (VMRD) with an OP of 92.07%, an OR of 91.67%, and an IA of 81.67%, and extensive ablation studies confirm the necessity of each component of our method. As the first end-to-end grasping framework to incorporate self attention into the relationship reasoning module, our proposed method enhances the logical capabilities of robots, enabling efficient grasping operations in complex and dynamic scenes, and fostering human-robot collaboration.

A Semantic Robotic Grasping Framework Based on Multi-Task Learning in Stacking Scenes.

A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes

Robotic Grasping in Multi-Object Stacking Scenes Based on Visual Reasoning

Gated Self Attention Network for Efficient Grasping of Target Objects in Stacked Scenarios

SUGrasping: a Semantic Grasping Framework Based on Multi-Head 3D U-Net

RPRG: Toward Real-time Robotic Perception, Reasoning and Grasping with One Multi-task Convolutional Neural Network.

A Robotic Semantic Grasping Method for Pick-and-place Tasks

Robotic Grasping Method Based on 3D Vision for Stacked Rectangular Objects

Efficient Grasp Detection Network with Gaussian-Based Grasp Representation for Robotic Manipulation

Task-Oriented Grasping In Object Stacking Scenes With Crf-Based Semantic Model

Residual Squeeze-and-Excitation Network with Multi-scale Spatial Pyramid Module for Fast Robotic Grasping Detection

A learning framework for semantic reach-to-grasp tasks integrating machine learning and optimization.

A two-stage grasp detection method for sequential robotic grasping in stacking scenarios

A robot grasping detection network based on flexible selection of multi-modal feature fusion structure

Robot Dynamic Object Positioning and Grasping Method Based on Two Stages

A Single Multi-Task Deep Neural Network with a Multi-Scale Feature Aggregation Mechanism for Manipulation Relationship Reasoning in Robotic Grasping

UPG: 3D Vision-Based Prediction Framework for Robotic Grasping in Multi-Object Scenes.

A Multi-Scale Robotic Tool Grasping Method for Robot State Segmentation Masks

Multitarget Flexible Grasping Detection Method for Robots in Unstructured Environments

Robotic Objects Detection and Grasping in Clutter Based on Cascaded Deep Convolutional Neural Network