Abstract:Grasping in stacked scenarios is an indispensable capability for intelligent robots. However, in the context of multi-object stacking or occluded scenes, existing algorithms for direct target object grasping result in a high failure rate, while methods for scene clearing grasping lead to inefficiency. Hence, executing grasping operations for target object in a logical sequence is imperative. To address this challenge, we propose an end-to-end grasping model based on a Gated Self Attention Network(GSAN), designed to guide robots to perform optimal sequential grasping of target objects within dense and cluttered scenes. We integrate object detection, grasp detection, and stacking relationship reasoning into a single deep neural network. Specifically, the object detection and grasp detection networks extract features from input RGB images and estimate object categories, bounding boxes and grasp poses. The GSAN captures the non-Euclidean information between object features in high-dimensional space, enhancing the accuracy of triplet relationship reasoning through gated self attention and positional encoding. Our algorithm achieves the best results in the Visual Manipulation Relationship Dataset (VMRD) with an OP of 92.07%, an OR of 91.67%, and an IA of 81.67%, and extensive ablation studies confirm the necessity of each component of our method. As the first end-to-end grasping framework to incorporate self attention into the relationship reasoning module, our proposed method enhances the logical capabilities of robots, enabling efficient grasping operations in complex and dynamic scenes, and fostering human-robot collaboration.

Robotic Grasping in Multi-Object Stacking Scenes Based on Visual Reasoning

A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes

A Semantic Robotic Grasping Framework Based on Multi-Task Learning in Stacking Scenes.

Robotic Grasping Method Based on 3D Vision for Stacked Rectangular Objects

Gated Self Attention Network for Efficient Grasping of Target Objects in Stacked Scenarios

RPRG: Toward Real-time Robotic Perception, Reasoning and Grasping with One Multi-task Convolutional Neural Network.

Secure Grasping Detection of Objects in Stacked Scenes Based on Single-Frame RGB Images

A two-stage grasp detection method for sequential robotic grasping in stacking scenarios

UPG: 3D Vision-Based Prediction Framework for Robotic Grasping in Multi-Object Scenes.

Task-Oriented Grasping In Object Stacking Scenes With Crf-Based Semantic Model

Robot Grasping Detection in Object Overlapping Scenes Based on Multi-Stage ROI Extraction

Efficient Grasp Detection Network with Gaussian-Based Grasp Representation for Robotic Manipulation

Visual Manipulation Relationship Detection based on Gated Graph Neural Network for Robotic Grasping

A Novel Vision-Based Multi-Task Robotic Grasp Detection Method for Multi-Object Scenes

Grasp Manipulation Relationship Detection Based on Graph Sample and Aggregation

Robot Dynamic Object Positioning and Grasping Method Based on Two Stages

Multitarget Flexible Grasping Detection Method for Robots in Unstructured Environments

Visual Manipulation Relationship Detection with Fully Connected CRFs for Autonomous Robotic Grasp

Grasping with Occlusion-Aware Ally Method in Complex Scenes

A Single Multi-Task Deep Neural Network with a Multi-Scale Feature Aggregation Mechanism for Manipulation Relationship Reasoning in Robotic Grasping