Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

Leonard Bärmann,Rainer Kartmann,Fabian Peller-Konrad,Jan Niehues,Alex Waibel,Tamim Asfour
2024-05-16
Abstract:Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.
Robotics,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the issue of how to enable robots to incrementally learn behaviors through natural language interaction. Specifically, the research goal is to allow robots to understand human natural language instructions and learn from human corrective instructions when tasks are not performed ideally, thereby gradually improving their behavior. To achieve this goal, the authors propose a system that utilizes large language models (LLMs) to achieve advanced behavior orchestration for robots. The specific implementation methods include: 1. **Interactive Learning Framework**: The system uses a simulated Python console environment to interact with LLMs, allowing LLMs to generate Python code snippets to invoke the robot's perception functions or perform actions. Moreover, the framework can dynamically respond to errors during execution and allows LLMs to generate new code based on previous interaction history. 2. **Memory Mechanism**: The robot has a memory system for storing information about the current scene (such as objects, locations, etc.) and experiences from past interactions (i.e., interaction history). These experiences are used to assist future decision-making and behavior improvement. 3. **Dynamic Prompt Construction**: The system dynamically constructs prompt information based on the current user's request and interaction history. It retrieves past interaction examples from the robot's memory that are most similar to the current situation and uses these examples as input for the LLM to guide it in generating appropriate code. 4. **Incremental Learning**: When the robot makes mistakes or inadequacies while performing tasks, humans can correct them through further instructions. At this point, the system can trigger a special function that allows another LLM to analyze the entire interaction process and propose improvements. These improved interactions are stored in the robot's memory bank for future reference in similar situations. In summary, the research findings of the paper enable robots to continuously learn and optimize their behavior under the guidance of natural language, thus better accomplishing tasks.