Abstract:The infamous exploration-exploitation dilemma is one of the oldest and most important problems in reinforcement learning (RL). Deliberate and effective exploration is necessary for RL agents to succeed in most environments. However, until very recently even very sophisticated RL algorithms employed simple, undirected exploration strategies in large-scale RL tasks. We introduce a new optimistic count-based exploration algorithm for RL that is feasible in high-dimensional MDPs. The success of RL algorithms in these domains depends crucially on generalization from limited training experience. Function approximation techniques enable RL agents to generalize in order to estimate the value of unvisited states, but at present few methods have achieved generalization about the agent's uncertainty regarding unvisited states. We present a new method for computing a generalized state visit-count, which allows the agent to estimate the uncertainty associated with any state. In contrast to existing exploration techniques, our $\phi$-$\textit{pseudocount}$ achieves generalization by exploiting the feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The resulting $\phi$-$\textit{Exploration-Bonus}$ algorithm rewards the agent for exploring in feature space rather than in the original state space. This method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks. In particular, we report world-class results on several notoriously difficult Atari 2600 video games, including Montezuma's Revenge.

Provably Efficient Exploration for RL with Unsupervised Learning

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework.

Provably Efficient Exploration in Policy Optimization

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Reward-Free Exploration for Reinforcement Learning

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Reinforcement Learning with Probabilistically Complete Exploration

Exploration in Feature Space for Reinforcement Learning

Curiosity & Entropy Driven Unsupervised RL in Multiple Environments

Model-Free Active Exploration in Reinforcement Learning

Efficient Model-Free Exploration in Low-Rank MDPs

Success Probability of Exploration: a Concrete Analysis of Learning Efficiency

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

Exploration by Maximizing Rényi Entropy for Zero-Shot Meta RL.

Beyond Optimism: Exploration With Partially Observable Rewards

Light-weight probing of unsupervised representations for Reinforcement Learning

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain

Provably Efficient Exploration in Inverse Constrained Reinforcement Learning

Fast Rates for Maximum Entropy Exploration

Autonomous Exploration Under Uncertainty via Deep Reinforcement Learning on Graphs