Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Harsh Mahesheka,Zhixian Xie,Zhaoran Wang,Wanxin Jin

2024-10-12

Abstract:Learning from Demonstrations, particularly from biological experts like humans and animals, often encounters significant data acquisition challenges. While recent approaches leverage internet videos for learning, they require complex, task-specific pipelines to extract and retarget motion data for the agent. In this work, we introduce a language-model-assisted bi-level programming framework that enables a reinforcement learning agent to directly learn its reward from internet videos, bypassing dedicated data preparation. The framework includes two levels: an upper level where a vision-language model (VLM) provides feedback by comparing the learner's behavior with expert videos, and a lower level where a large language model (LLM) translates this feedback into reward updates. The VLM and LLM collaborate within this bi-level framework, using a "chain rule" approach to derive a valid search direction for reward learning. We validate the method for reward learning from YouTube videos, and the results have shown that the proposed method enables efficient reward design from expert videos of biological agents for complex behavior synthesis.

Robotics,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to directly learn the reward function of robots from Internet videos without special data preparation. Specifically, the existing Learning from Demonstration (LfD) methods face challenges in obtaining expert data, especially when using living organisms such as humans and animals as demonstrators. Although some methods use Internet videos for learning, these methods usually require complex task - specific pipelines to extract and redirect motion data to agents. This paper proposes a two - level programming framework assisted by a language model, enabling reinforcement - learning agents to directly learn their reward functions from Internet videos, thus bypassing the special data - preparation steps. This framework is implemented through two levels: the upper level uses a Vision - Language Model (VLM) to provide feedback, comparing the learner's actions with expert videos; the lower level uses a Large Language Model (LLM) to convert this feedback into reward updates. The VLM and LLM collaborate within this two - level framework, adopting the "chain rule" method to derive an effective reward - learning search direction. Through this method, the researchers verified the effectiveness of learning rewards from YouTube videos, demonstrating that this method can efficiently design rewards from expert videos of biological agents for the synthesis of complex behaviors.

Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics

Video2Reward: Generating Reward Function from Videos for Legged Robot Behavior Learning

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

Temporal Video-Language Alignment Network for Reward Shaping in Reinforcement Learning

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

From Goal-Conditioned to Language-Conditioned Agents via Vision-Language Models

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

Video Prediction Models as Rewards for Reinforcement Learning

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Language to Rewards for Robotic Skill Synthesis

Reinforcement Learning Friendly Vision-Language Model for Minecraft

Code as Reward: Empowering Reinforcement Learning with VLMs

Adaptive Language-Guided Abstraction from Contrastive Explanations

Rank2Reward: Learning Shaped Reward Functions from Passive Video

Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

Vision-Language Models as a Source of Rewards

LIV: Language-Image Representations and Rewards for Robotic Control

Self-refined large language model as automated reward function designer for deep reinforcement learning in robotics

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts