Abstract:As humans, we can naturally break down a task into individual steps in our daily lives and we are able to provide feedback or dynamically adjust the plan when encountering obstacles. Similarly, our aim is to facilitate agents in comprehending and carrying out natural language instructions in a more efficient and cost-effective manner. For example, in Vision-Language Navigation (VLN) tasks, the agent needs to understand instructions such as "go to the table by the fridge". This understanding allows the agent to navigate to the table and infer that the destination is likely to be in the kitchen. The traditional VLN approach mainly involves training models using a large number of labeled datasets for task planning in unseen environments. However, manual labeling incurs a high cost for this approach. Considering that large language models (LLMs) already possess extensive commonsense knowledge during pre-training, some researchers have started using LLMs as decision modules in embodied tasks, although this approach shows the LLMs' reasoning ability to plan a logical sequence of subtasks based on global information. However, executing subtasks often encounters issues, such as obstacles that hinder progress and alterations in the state of the target object. Even one mistake can cause the subsequent tasks to fail, which makes it challenging to complete the instructions through a single plan. Therefore, we propose a new approach-C (Correction) and P (Planning) with M (Memory) I (Integration)-that centered on an LLM for embodied tasks. In more detail, the auxiliary modules of the CPMI facilitate dynamic planning by the LLM-centric planner. These modules provide the agent with memory and generalized experience mechanisms to fully utilize the LLM capabilities, allowing it to improve its performance during execution. Finally, the experimental results on public datasets demonstrate that we achieve the best performance in the few-shot scenario, improving the efficiency of the successive task while increasing the success rate.

CLFR-M: Continual Learning Framework for Robots Via Human Feedback and Dynamic Memory

Continual Learning through Human-Robot Interaction -- Human Perceptions of a Continual Learning Robot in Repeated Interactions

Interactive Continual Learning: Fast and Slow Thinking

FLTRNN: Faithful Long-Horizon Task Planning for Robotics with Large Language Models

Continual Learning for Autonomous Robots: A Prototype-based Approach

Generalized Robot Learning Framework

LLM as A Robotic Brain: Unifying Egocentric Memory and Control

Continual Skill and Task Learning via Dialogue

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

Online Active Continual Learning for Robotic Lifelong Object Recognition

Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models

Interactive Robot Learning from Verbal Correction

RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models

A Novel Continuous Learning and Collaborative Decision Making Mechanism for Real-Time Cooperation of Humanoid Service Robots

Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

C3F: Constant Collaboration and Communication Framework for Graph-Representation Dynamic Multi-Robotic Systems

Interactive Continual Learning Architecture for Long-Term Personalization of Home Service Robots

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

RIRL: A Recurrent Imitation and Reinforcement Learning Method for Long-Horizon Robotic Tasks

Leave It to Large Language Models! Correction and Planning with Memory Integration

In-Context Learning Enables Robot Action Prediction in LLMs