Abstract:Empowered by deep neural networks, deep reinforcement learning (DRL) has demonstrated tremendous empirical successes in various domains, including games, health care, and autonomous driving. Despite these advancements, DRL is still identified as data-inefficient as effective policies demand vast numbers of environmental samples. Recently, episodic control (EC)-based model-free DRL methods enable sample efficiency by recalling past experiences from episodic memory. However, existing EC-based methods suffer from the limitation of potential misalignment between the state and reward spaces for neglecting the utilization of (past) retrieval states with extensive information, which probably causes inaccurate value estimation and degraded policy performance. To tackle this issue, we introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information. To be specific, we reuse the historical states retrieved by EC as part of the input states and integrate the retrieved MC-returns into the immediate reward in each interactive transition. As a result, our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss. Empirical results on challenging Box2d and Mujoco tasks demonstrate the superiority of our method over a recent sibling method and common baselines. Further, we also verify our method's effectiveness in alleviating Q-value overestimation by additional experiments of Q-value comparison.

Sample Efficient Reinforcement Learning Using Graph-Based Memory Reconstruction.

Episodic Reinforcement Learning with Associative Memory.

Sample-efficient multi-agent reinforcement learning with masked reconstruction

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

Dual Memory Model for Experience-Once Task-Incremental Lifelong Learning.

Model-Based Reinforcement Learning Via Imagination with Derived Memory.

Remember the Past for Better Future: Memory-Augmented Offline RL

Graph Memory Learning: Imitating Lifelong Remembering and Forgetting of Brain Networks

Prioritized Generative Replay

Augmented Replay Memory in Reinforcement Learning With Continuous Control

Learning a World Model With Multitimescale Memory Augmentation

Map-based Experience Replay: A Memory-Efficient Solution to Catastrophic Forgetting in Reinforcement Learning

Deep Reinforcement Learning with Parametric Episodic Memory

MRHER: Model-based Relay Hindsight Experience Replay for Sequential Object Manipulation Tasks with Sparse Rewards

Integrating human learning and reinforcement learning: A novel approach to agent training

Reinforcement Learning with Fast and Forgetful Memory

Graph learning-based generation of abstractions for reinforcement learning

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning

Reinforcement Learning for Dynamic Memory Allocation

Episodic Reinforcement Learning with Expanded State-reward Space

Mastering Memory Tasks with World Models