Abstract:The infamous exploration-exploitation dilemma is one of the oldest and most important problems in reinforcement learning (RL). Deliberate and effective exploration is necessary for RL agents to succeed in most environments. However, until very recently even very sophisticated RL algorithms employed simple, undirected exploration strategies in large-scale RL tasks. We introduce a new optimistic count-based exploration algorithm for RL that is feasible in high-dimensional MDPs. The success of RL algorithms in these domains depends crucially on generalization from limited training experience. Function approximation techniques enable RL agents to generalize in order to estimate the value of unvisited states, but at present few methods have achieved generalization about the agent's uncertainty regarding unvisited states. We present a new method for computing a generalized state visit-count, which allows the agent to estimate the uncertainty associated with any state. In contrast to existing exploration techniques, our $\phi$-$\textit{pseudocount}$ achieves generalization by exploiting the feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The resulting $\phi$-$\textit{Exploration-Bonus}$ algorithm rewards the agent for exploring in feature space rather than in the original state space. This method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks. In particular, we report world-class results on several notoriously difficult Atari 2600 video games, including Montezuma's Revenge.

Pseudo Value Network Distillation for High-Performance Exploration

Exploration and Anti-Exploration with Distributional Random Network Distillation

Self-Supervised Network Distillation for Exploration

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Distributional Reinforcement Learning for Efficient Exploration

Random curiosity-driven exploration in deep reinforcement learning

BeBold: Exploration Beyond the Boundary of Explored Regions

Never Give Up: Learning Directed Exploration Strategies

DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards

NovelD: A Simple yet Effective Exploration Criterion

Exploration in Feature Space for Reinforcement Learning

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

Generative Exploration and Exploitation

PreND: Enhancing Intrinsic Motivation in Reinforcement Learning through Pre-trained Network Distillation

Reward-Free Exploration for Reinforcement Learning

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

DQN with model-based exploration: efficient learning on environments with sparse rewards

Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning

Is Exploration All You Need? Effective Exploration Characteristics for Transfer in Reinforcement Learning

Random Latent Exploration for Deep Reinforcement Learning