Abstract:Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.

What problem does this paper attempt to address?

The paper aims to address the issue of how to enable robots to incrementally learn behaviors through natural language interaction. Specifically, the research goal is to allow robots to understand human natural language instructions and learn from human corrective instructions when tasks are not performed ideally, thereby gradually improving their behavior. To achieve this goal, the authors propose a system that utilizes large language models (LLMs) to achieve advanced behavior orchestration for robots. The specific implementation methods include: 1. **Interactive Learning Framework**: The system uses a simulated Python console environment to interact with LLMs, allowing LLMs to generate Python code snippets to invoke the robot's perception functions or perform actions. Moreover, the framework can dynamically respond to errors during execution and allows LLMs to generate new code based on previous interaction history. 2. **Memory Mechanism**: The robot has a memory system for storing information about the current scene (such as objects, locations, etc.) and experiences from past interactions (i.e., interaction history). These experiences are used to assist future decision-making and behavior improvement. 3. **Dynamic Prompt Construction**: The system dynamically constructs prompt information based on the current user's request and interaction history. It retrieves past interaction examples from the robot's memory that are most similar to the current situation and uses these examples as input for the LLM to guide it in generating appropriate code. 4. **Incremental Learning**: When the robot makes mistakes or inadequacies while performing tasks, humans can correct them through further instructions. At this point, the system can trigger a special function that allows another LLM to analyze the entire interaction process and propose improvements. These improved interactions are stored in the robot's memory bank for future reference in similar situations. In summary, the research findings of the paper enable robots to continuously learn and optimize their behavior under the guidance of natural language, thus better accomplishing tasks.

Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

Interactive Robot Learning from Verbal Correction

LaMI: Large Language Models for Multi-Modal Human-Robot Interaction

CLFR-M: Continual Learning Framework for Robots Via Human Feedback and Dynamic Memory

Interactive Robot Learning of Gestures, Language and Affordances

Incremental procedural and sensorimotor learning in cognitive humanoid robots

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement Learning

Simultaneously learning intentions and preferences during physical human-robot cooperation

Generative Expressive Robot Behaviors using Large Language Models

One to rule them all: natural language to bind communication, perception and action

Multimodal Human-Autonomous Agents Interaction Using Pre-Trained Language and Visual Foundation Models

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

LLM-MARS: Large Language Model for Behavior Tree Generation and NLP-enhanced Dialogue in Multi-Agent Robot Systems

Continual Skill and Task Learning via Dialogue

Large Language Models as Zero-Shot Human Models for Human-Robot Interaction

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

Language-Conditioned Imitation Learning for Robot Manipulation Tasks