Grounding Language for Robotic Manipulation via Skill Library

Yinghao Zhao,Zhongxiang Zhou,Rong Xiong
DOI: https://doi.org/10.1109/MLCCIM60412.2023.00062
2023-01-01
Abstract:Given the language instructions and a raw image, how can we enable robots to reason about semantic concepts and manipulate objects accordingly? Recent research on language-conditioned manipulation tasks has introduced end-to-end frameworks that combine the semantic understanding with the precise spatial reasoning. But these works require lots of training data and fail to generalize to more complex task scenes. We propose a novel Language-Goal-Skill architecture that decouples language-visual grounding and skill learning, which is more effective and generalizable. It leverages pre-trained models to infer manipulation skills, the scene objects and spatial relations, builds a skill library for diverse task scenes. Experiments in simulated settings suggest that our approach achieve a higher success rate for multi skills compared with baseline methods.
What problem does this paper attempt to address?