Abstract:Developing interactive systems that leverage natural language instructions to solve complex robotic control tasks has been a long-desired goal in the robotics community. Large Language Models (LLMs) have demonstrated exceptional abilities in handling complex tasks, including logical reasoning, in-context learning, and code generation. However, predicting low-level robotic actions using LLMs poses significant challenges. Additionally, the complexity of such tasks usually demands the acquisition of policies to execute diverse subtasks and combine them to attain the ultimate objective. Hierarchical Reinforcement Learning (HRL) is an elegant approach for solving such tasks, which provides the intuitive benefits of temporal abstraction and improved exploration. However, HRL faces the recurring issue of non-stationarity due to unstable lower primitive behaviour. In this work, we propose LGR2, a novel HRL framework that leverages language instructions to generate a stationary reward function for the higher-level policy. Since the language-guided reward is unaffected by the lower primitive behaviour, LGR2 mitigates non-stationarity and is thus an elegant method for leveraging language instructions to solve robotic control tasks. To analyze the efficacy of our approach, we perform empirical analysis and demonstrate that LGR2 effectively alleviates non-stationarity in HRL. Our approach attains success rates exceeding 70$\%$ in challenging, sparse-reward robotic navigation and manipulation environments where the baselines fail to achieve any significant progress. Additionally, we conduct real-world robotic manipulation experiments and demonstrate that CRISP shows impressive generalization in real-world scenarios.

Linguistic Reward-Oriented Takagi-Sugeno Fuzzy Reinforcement Learning

Competitive Takagi-Sugeno Fuzzy Reinforcement Learning

Multiple rewards fuzzy reinforcement learning algorithm in RoboCup environment

A Takagi-Sugeno Fuzzy Controller With Reinforcement Learning Part

Incorporating Perception-Based Information in Reinforcement Learning Using Computing with Words

Fuzzy-Based Adaptive Optimization of Unknown Discrete-Time Nonlinear Markov Jump Systems With Off-Policy Reinforcement Learning

Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

LongReward: Improving Long-context Large Language Models with AI Feedback

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

Parsing Natural Language into Propositional and First-Order Logic with Dual Reinforcement Learning.

A Reinforcement Learning Method for LQR Control Problem

LGR2: Language Guided Reward Relabeling for Accelerating Hierarchical Reinforcement Learning

Reinforcement Learning of an Interpretable Fuzzy System through a Neural Fuzzy Actor-Critic Framework for Mobile Robot Control

Fuzzy $H_{\infty }$ Control of Discrete-Time Nonlinear Markov Jump Systems via a Novel Hybrid Reinforcement $Q$-Learning Method

Suboptimal control for nonlinear slow‐fast coupled systems using reinforcement learning and Takagi–Sugeno fuzzy methods

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives