Abstract:Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks still represents a major challenge -- the longer the task is, the more likely it is that some stage will fail. Can humans help the robot to continuously improve its long-horizon task performance through intuitive and natural feedback? In this paper, we make the following observation: high-level policies that index into sufficiently rich and expressive low-level language-conditioned skills can be readily supervised with human feedback in the form of language corrections. We show that even fine-grained corrections, such as small movements ("move a bit to the left"), can be effectively incorporated into high-level policies, and that such corrections can be readily obtained from humans observing the robot and making occasional suggestions. This framework enables robots not only to rapidly adapt to real-time language feedback, but also incorporate this feedback into an iterative training scheme that improves the high-level policy's ability to correct errors in both low-level execution and high-level decision-making purely from verbal feedback. Our evaluation on real hardware shows that this leads to significant performance improvement in long-horizon, dexterous manipulation tasks without the need for any additional teleoperation. Videos and code are available at

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to improve the performance of robots in performing complex long - term tasks through natural - language feedback. Specifically, the authors observed that for complex tasks requiring multiple stages to complete, the success rate of robots decreases significantly as the task length increases. This is because each stage is likely to go wrong, and these errors will accumulate and lead to the failure of the entire task. To solve this problem, the paper proposes a system named "Yell At Your Robot" (abbreviated as YAY Robot). This system allows robots to adjust their behavior immediately through real - time language feedback and can continuously learn from this feedback to improve their performance in future tasks. ### Main Contributions 1. **Real - Time Adaptation**: YAY Robot enables robots to respond in real - time to language instructions provided by humans, thereby immediately correcting errors or adjusting behavior. 2. **Continuous Improvement**: By collecting human language feedback, the system can continuously optimize high - level strategies, reduce the need for human intervention, and improve the autonomous performance of robots in complex tasks. 3. **Natural Interaction**: Users can interact directly with robots through natural language to guide them to complete tasks, which enables non - professional users to easily teach robots new skills. ### Technical Implementation - **Low - Level Strategy**: Use the Language - Conditioned Behavior Cloning (LCBC) strategy, which learns the ability to map from visual and language inputs to action outputs through deep neural networks. - **High - Level Strategy**: Generate language instructions to guide the low - level strategy to perform specific tasks. The high - level strategy is trained based on visual inputs (such as images) and historical context through the Transformer architecture. - **Feedback Integration**: During the deployment process, if the user believes that the robot's behavior is incorrect or needs adjustment, they can intervene through language commands. These interventions are recorded for subsequent fine - tuning of high - level strategies. ### Experimental Verification The paper verifies the effectiveness of YAY Robot through three specific multi - stage operation tasks: 1. **Bagging Task**: Put multiple items into a zippered bag. 2. **Mixed Snack Bag Preparation Task**: Scoop different ingredients with a spoon to make a mixed snack bag. 3. **Cleaning Task**: Clean the gummy candies stuck on the plate. The experimental results show that through real - time language feedback, YAY Robot can significantly improve the success rate during task execution, and through continuous learning, further enhance the autonomous performance of robots in complex tasks. ### Conclusion By introducing a natural - language feedback mechanism, YAY Robot not only realizes the real - time adaptation of robots in complex tasks but also improves their long - term performance through continuous learning, demonstrating great potential in the field of human - robot interaction.

Yell At Your Robot: Improving On-the-Fly from Language Corrections

Interactive Robot Learning from Verbal Correction

"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

Language to Rewards for Robotic Skill Synthesis

CLFR-M: Continual Learning Framework for Robots Via Human Feedback and Dynamic Memory

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Autonomous Improvement of Instruction Following Skills via Foundation Models

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities

Correct Me If I am Wrong: Interactive Learning for Robotic Manipulation

Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models

Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

Grounding Language with Visual Affordances over Unstructured Data

Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration

Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors