Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Huy Ha,Pete Florence,Shuran Song

2023-10-01

Abstract:We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection procedure, while improving absolute success rates by 33.2% on average across five domains. Code, data, and additional qualitative results are available on <a class="link-external link-https" href="https://www.cs.columbia.edu/~huy/scalingup/" rel="external noopener nofollow">this https URL</a>.

Robotics

What problem does this paper attempt to address?

This paper proposes a framework for robot skill acquisition, aiming to address the problem of efficiently expanding data collection and effectively learning visual-motor strategies in the context of multi-task language conditions. The framework consists of two key parts: 1. Data expansion: Using a large language model (LLM) for high-level planning, combined with sampling-based robot planners (such as motion or grasp samplers) to generate diverse manipulation trajectories. The LLM also infers code snippets for the success conditions of each task, enabling the data collection process to detect failures and retries, while automatically labeling the trajectories as successful or failed. 2. Knowledge distillation: Extending the diffusion strategy to a multi-task setting, learning a closed-loop visual-language-motion strategy through language-conditioned training. The learned strategy successfully learns robust retry behavior exhibited during the data collection process and improves the average absolute success rate by 33.2% across five domains. Additionally, the paper proposes a new multi-task benchmark that includes 18 tasks covering five domains, testing long-term behaviors, common-sense reasoning, tool usage, and intuitive understanding of physics. The core of the research methodology is the efficient exploration using the common-sense reasoning ability of the LLM while learning reusable 6-DoF skills for real-world applications. The paper demonstrates through experiments that the proposed framework outperforms other methods in terms of data generation efficiency and strategy learning effectiveness, and can be directly transferred to the real world without fine-tuning.

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Grounding Language for Robotic Manipulation via Skill Library

Grounding Language with Visual Affordances over Unstructured Data

Agentic Skill Discovery

Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data

Language to Rewards for Robotic Skill Synthesis

GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs

Rethinking Mutual Information for Language Conditioned Skill Discovery on Imitation Learning

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Generalizable Long-Horizon Manipulations with Large Language Models

CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

Spatial-Language Attention Policies for Efficient Robot Learning

Data Scaling Laws in Imitation Learning for Robotic Manipulation

Continual Skill and Task Learning via Dialogue

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

Scaling data-driven robotics with reward sketching and batch reinforcement learning

Grounding Robot Policies with Visuomotor Language Guidance

Scaling simulation-to-real transfer by learning composable robot skills