Abstract:Continual and interactive robot learning is a challenging problem as the robot is present with human users who expect the robot to learn novel skills to solve novel tasks perpetually with sample efficiency. In this work we present a framework for robots to query and learn visuo-motor robot skills and task relevant information via natural language dialog interactions with human users. Previous approaches either focus on improving the performance of instruction following agents, or passively learn novel skills or concepts. Instead, we used dialog combined with a language-skill grounding embedding to query or confirm skills and/or tasks requested by a user. To achieve this goal, we developed and integrated three different components for our agent. Firstly, we propose a novel visual-motor control policy ACT with Low Rank Adaptation (ACT-LoRA), which enables the existing SoTA ACT model to perform few-shot continual learning. Secondly, we develop an alignment model that projects demonstrations across skill embodiments into a shared embedding allowing us to know when to ask questions and/or demonstrations from users. Finally, we integrated an existing LLM to interact with a human user to perform grounded interactive continual skill learning to solve a task. Our ACT-LoRA model learns novel fine-tuned skills with a 100% accuracy when trained with only five demonstrations for a novel skill while still maintaining a 74.75% accuracy on pre-trained skills in the RLBench dataset where other models fall significantly short. We also performed a human-subjects study with 8 subjects to demonstrate the continual learning capabilities of our combined framework. We achieve a success rate of 75% in the task of sandwich making with the real robot learning from participant data demonstrating that robots can learn novel skills or task knowledge from dialogue with non-expert users using our approach.

A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play

Grounding Robot Policies with Visuomotor Language Guidance

Diffskill: Improving Reinforcement Learning Through Diffusion-Based Skill Denoiser for Robotic Manipulation

3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

Learning to combine primitive skills: A step towards versatile robotic manipulation

SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Efficient Robot Skill Learning with Imitation from a Single Video for Contact-Rich Fabric Manipulation

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Learning to Play by Imitating Humans

Agentic Skill Discovery

Learning and Retrieval from Prior Data for Skill-based Imitation Learning

Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools

EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Continual Skill and Task Learning via Dialogue

Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach