Language-Guided Category Push–Grasp Synergy Learning in Clutter by Efficiently Perceiving Object Manipulation Space

Min Zhao,Guoyu Zuo,Shuangyue Yu,Yongkang Luo,Chunfang Liu,Daoxiong Gong
DOI: https://doi.org/10.1109/tii.2024.3488774
IF: 12.3
2024-01-01
IEEE Transactions on Industrial Informatics
Abstract:In flexible manufacturing, robots need to swiftly adapt to constantly changing production tasks. However, it remains a challenging problem for robots to grasp objects of specific categories through language instructions to complete production tasks in cluttered scenes. To address this issue, this article proposes a language-guided category push–grasp synergy network following a cognitive-decision framework. First, inspired by how humans can understand the world through interactions with the environment, we propose an environment state difference embodied self-supervision method that enables robots to autonomously collect embodied multimodal data and generate ground truths that eliminate annotation errors for cognition network training. Second, we develop a language-guided embodied multimodal object cognition network that fuses color and depth image information, enhancing the object cognition ability of robots in cluttered scenes and enabling dynamic semantic segmentation based on language commands. Finally, we propose an object manipulation space metric to measure the manipulable space of target objects, linking the reward function with metric changes before and after actions, thereby enhancing the system's perception of the manipulation space and improving operational performance. Experiments conducted in both simulated and real-world environments demonstrate that our proposed method outperforms existing state-of-the-art methods and can be generalized for grasping novel objects.
What problem does this paper attempt to address?