Abstract:Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended exploration, and hierarchical skill design. Recent works have made promising steps by exploiting the prior knowledge of large language models (LLMs). However, these approaches suffer from important limitations: they are either not scalable to problems requiring billions of environment samples; or are limited to reward functions expressible by compact code, which may require source code and have difficulty capturing nuanced semantics; or require a diverse offline dataset, which may not exist or be impossible to collect. In this work, we address these limitations through a combination of algorithmic and systems-level contributions. We propose ONI, a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function using LLM feedback. Our approach annotates the agent's collected experience via an asynchronous LLM server, which is then distilled into an intrinsic reward model. We explore a range of algorithmic choices for reward modeling with varying complexity, including hashing, classification, and ranking models. By studying their relative tradeoffs, we shed light on questions regarding intrinsic reward design for sparse reward problems. Our approach achieves state-of-the-art performance across a range of challenging, sparse reward tasks from the NetHack Learning Environment in a simple unified process, solely using the agent's gathered experience, without requiring external datasets nor source code. We make our code available at \url{URL} (coming soon).

Mnemonic Dictionary Learning for Intrinsic Motivation in Reinforcement Learning

Model-Based Reinforcement Learning Via Imagination with Derived Memory.

Dual Memory Model for Experience-Once Task-Incremental Lifelong Learning.

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback

Image Augmentation Based Momentum Memory Intrinsic Reward for Sparse Reward Visual Scenes

In-Memory Learning: A Declarative Learning Framework for Large Language Models

Exploring Automated Keyword Mnemonics Generation with Large Language Models via Overgenerate-and-Rank

Motif: Intrinsic Motivation from Artificial Intelligence Feedback

Learning a World Model With Multitimescale Memory Augmentation

From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards

A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick

Two-Memory Reinforcement Learning

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models

Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration

Intrinsically-Motivated Reinforcement Learning: A Brief Introduction

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language Models

World Models with Hints of Large Language Models for Goal Achieving