Abstract:Continual and interactive robot learning is a challenging problem as the robot is present with human users who expect the robot to learn novel skills to solve novel tasks perpetually with sample efficiency. In this work we present a framework for robots to query and learn visuo-motor robot skills and task relevant information via natural language dialog interactions with human users. Previous approaches either focus on improving the performance of instruction following agents, or passively learn novel skills or concepts. Instead, we used dialog combined with a language-skill grounding embedding to query or confirm skills and/or tasks requested by a user. To achieve this goal, we developed and integrated three different components for our agent. Firstly, we propose a novel visual-motor control policy ACT with Low Rank Adaptation (ACT-LoRA), which enables the existing SoTA ACT model to perform few-shot continual learning. Secondly, we develop an alignment model that projects demonstrations across skill embodiments into a shared embedding allowing us to know when to ask questions and/or demonstrations from users. Finally, we integrated an existing LLM to interact with a human user to perform grounded interactive continual skill learning to solve a task. Our ACT-LoRA model learns novel fine-tuned skills with a 100% accuracy when trained with only five demonstrations for a novel skill while still maintaining a 74.75% accuracy on pre-trained skills in the RLBench dataset where other models fall significantly short. We also performed a human-subjects study with 8 subjects to demonstrate the continual learning capabilities of our combined framework. We achieve a success rate of 75% in the task of sandwich making with the real robot learning from participant data demonstrating that robots can learn novel skills or task knowledge from dialogue with non-expert users using our approach.

What problem does this paper attempt to address?

The paper aims to address the issue of robots continuously learning new skills while interacting with humans. Specifically, the research team proposes a framework that enables robots to query and learn new visual-motor skills and related task information through natural language dialogue. The main problems this paper attempts to solve are as follows: 1. **Active Learning of New Skills**: Existing methods either focus on improving the performance of instruction-following agents or passively learning new skills or concepts. The framework proposed in this paper utilizes dialogue combined with language-skill grounding embedding to actively ask or confirm the skills and tasks requested by the user. 2. **Sample-Efficient Continuous Learning**: To achieve this goal, the researchers developed and integrated three different components: - Proposed a novel visual-motor control strategy, ACT-LoRA, which enables the existing Action Chunking Transformer model to perform continuous learning with a small number of examples. - Developed an alignment model that maps demonstrations of different modalities into a shared embedding space to determine when to ask the user questions or for demonstrations. - Integrated a large language model (LLM) to interact with human users, performing task-based continuous skill learning. 3. **Experimental Validation**: The researchers demonstrated through experiments that their ACT-LoRA model could achieve 100% accuracy in training new skills with only 5 examples and maintain 74.75% accuracy on pre-trained skills. Additionally, a real-world experiment involving 8 participants showed that the robot could learn new skills from non-expert users through dialogue, with a success rate of 75%. In summary, this paper is dedicated to enabling robots to actively and efficiently learn new skills through natural language dialogue, thereby better accomplishing various tasks.

Continual Skill and Task Learning via Dialogue

Learning Robot Manipulation Skills from Human Demonstration Videos Using Two-Stream 2-D/3-D Residual Networks with Self-Attention

Dialogue Learning with Human-in-the-Loop.

Learning through Dialogue Interactions by Asking Questions

Task Learning Through Visual Demonstration and Situated Dialogue.

CLFR-M: Continual Learning Framework for Robots Via Human Feedback and Dynamic Memory

Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration

Interactive Robot Learning from Verbal Correction

Simulating User Agents for Embodied Conversational-AI

Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Lifelong Robot Learning with Human Assisted Language Planners

Evaluating Continual Learning on a Home Robot

Interactive Visual Task Learning for Robots

ARO: Large Language Model Supervised Robotics Text2Skill Autonomous Learning

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Learning Novel Skills from Language-Generated Demonstrations

Grounding Language with Visual Affordances over Unstructured Data

Continual Learning through Human-Robot Interaction -- Human Perceptions of a Continual Learning Robot in Repeated Interactions

How Do Human Users Teach a Continual Learning Robot in Repeated Interactions?