Abstract:Developing interactive systems that leverage natural language instructions to solve complex robotic control tasks has been a long-desired goal in the robotics community. Large Language Models (LLMs) have demonstrated exceptional abilities in handling complex tasks, including logical reasoning, in-context learning, and code generation. However, predicting low-level robotic actions using LLMs poses significant challenges. Additionally, the complexity of such tasks usually demands the acquisition of policies to execute diverse subtasks and combine them to attain the ultimate objective. Hierarchical Reinforcement Learning (HRL) is an elegant approach for solving such tasks, which provides the intuitive benefits of temporal abstraction and improved exploration. However, HRL faces the recurring issue of non-stationarity due to unstable lower primitive behaviour. In this work, we propose LGR2, a novel HRL framework that leverages language instructions to generate a stationary reward function for the higher-level policy. Since the language-guided reward is unaffected by the lower primitive behaviour, LGR2 mitigates non-stationarity and is thus an elegant method for leveraging language instructions to solve robotic control tasks. To analyze the efficacy of our approach, we perform empirical analysis and demonstrate that LGR2 effectively alleviates non-stationarity in HRL. Our approach attains success rates exceeding 70$\%$ in challenging, sparse-reward robotic navigation and manipulation environments where the baselines fail to achieve any significant progress. Additionally, we conduct real-world robotic manipulation experiments and demonstrate that CRISP shows impressive generalization in real-world scenarios.

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Following Instructions by Imagining and Reaching Visual Goals

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

From Goal-Conditioned to Language-Conditioned Agents via Vision-Language Models

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

LIV: Language-Image Representations and Rewards for Robotic Control

Vision-Language Models as a Source of Rewards

Using Natural Language for Reward Shaping in Reinforcement Learning

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos

Language Instructed Reinforcement Learning for Human-AI Coordination

Language Grounding through Social Interactions and Curiosity-Driven Multi-Goal Learning

Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning

Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Visual Grounding for Object-Level Generalization in Reinforcement Learning

Words as Beacons: Guiding RL Agents with High-Level Language Prompts

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

REvolve: Reward Evolution with Large Language Models using Human Feedback

Curricular Subgoals for Inverse Reinforcement Learning