Abstract:Empowered by deep neural networks, deep reinforcement learning (DRL) has demonstrated tremendous empirical successes in various domains, including games, health care, and autonomous driving. Despite these advancements, DRL is still identified as data-inefficient as effective policies demand vast numbers of environmental samples. Recently, episodic control (EC)-based model-free DRL methods enable sample efficiency by recalling past experiences from episodic memory. However, existing EC-based methods suffer from the limitation of potential misalignment between the state and reward spaces for neglecting the utilization of (past) retrieval states with extensive information, which probably causes inaccurate value estimation and degraded policy performance. To tackle this issue, we introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information. To be specific, we reuse the historical states retrieved by EC as part of the input states and integrate the retrieved MC-returns into the immediate reward in each interactive transition. As a result, our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss. Empirical results on challenging Box2d and Mujoco tasks demonstrate the superiority of our method over a recent sibling method and common baselines. Further, we also verify our method's effectiveness in alleviating Q-value overestimation by additional experiments of Q-value comparison.

Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Never Explore Repeatedly in Multi-Agent Reinforcement Learning

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Random curiosity-driven exploration in deep reinforcement learning

Never Revisit: Continuous Exploration in Multi-Agent Reinforcement Learning

Regioned Episodic Reinforcement Learning

Episodic Reinforcement Learning with Expanded State-reward Space

Successor-Predecessor Intrinsic Exploration

Distributional Reinforcement Learning for Efficient Exploration

Pseudo Value Network Distillation for High-Performance Exploration

Multimodal Reward Shaping for Efficient Exploration in Reinforcement Learning

Intrinsic Motivation Exploration Via Self-Supervised Prediction in Reinforcement Learning

Go Beyond Imagination: Maximizing Episodic Reachability with World Models

Efficient and Scalable Exploration Via Estimation-Error

BeBold: Exploration Beyond the Boundary of Explored Regions

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Optimized Feature Extraction for Sample Efficient Deep Reinforcement Learning