A Semantic Robotic Grasping Framework Based on Multi-Task Learning in Stacking Scenes.

Shengqi Duan,Guohui Tian,Zhongli Wang,Shaopeng Liu,Chenrui Feng
DOI: https://doi.org/10.1016/j.engappai.2023.106059
IF: 8
2023-01-01
Engineering Applications of Artificial Intelligence
Abstract:Autonomous robotic grasping is an essential skill for service robots to perform specified tasks in unstructured scenarios. Previous work focus on simple pick-and-place tasks, and it is not satisfactory for real-world scenes that have requirements for manipulation. In this paper, we present a modular intelligent robot architecture via multi-task convolutional neural network which can be used for specific object grasping and manipulation in a stacked and cluttered environment. Firstly, an end-to-end, multi-task semantic grasping convolutional neural network (MSG-ConvNet) that simultaneously outputs the results of grasp detection and semantic segmentation is proposed to recognize the affiliations between objects and grasps in cluttered scenarios. Secondly, we propose a post-processing method which allows the robot to select an optimal grasping area in an active perception way through simply reasoning on the multi-modal information output by the proposed model. The proposed multi-task network has a great improvement in both recognition accuracy and detection speed on the public multi-object dataset GraspNet-1Billion compared with the benchmark. The proposed grasp detection method also yields state-of-the-art performance with accuracies of 95.06% and 98.6% on the public single-object Jacquard Dataset and Cornell Dataset, respectively. In addition, the experiments in a real-world scene demonstrate that our proposed method has stronger robustness and adaptability than the simple direct grasping strategy in the environment with higher mutual occlusion.
What problem does this paper attempt to address?