Abstract:The infamous exploration-exploitation dilemma is one of the oldest and most important problems in reinforcement learning (RL). Deliberate and effective exploration is necessary for RL agents to succeed in most environments. However, until very recently even very sophisticated RL algorithms employed simple, undirected exploration strategies in large-scale RL tasks. We introduce a new optimistic count-based exploration algorithm for RL that is feasible in high-dimensional MDPs. The success of RL algorithms in these domains depends crucially on generalization from limited training experience. Function approximation techniques enable RL agents to generalize in order to estimate the value of unvisited states, but at present few methods have achieved generalization about the agent's uncertainty regarding unvisited states. We present a new method for computing a generalized state visit-count, which allows the agent to estimate the uncertainty associated with any state. In contrast to existing exploration techniques, our $\phi$-$\textit{pseudocount}$ achieves generalization by exploiting the feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The resulting $\phi$-$\textit{Exploration-Bonus}$ algorithm rewards the agent for exploring in feature space rather than in the original state space. This method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks. In particular, we report world-class results on several notoriously difficult Atari 2600 video games, including Montezuma's Revenge.

Robust Exploration with Tight Bayesian Plausibility Sets

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

The Uncertainty Bellman Equation and Exploration

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Beyond Optimism: Exploration With Partially Observable Rewards

CARE: Confidence-rich Autonomous Robot Exploration Using Bayesian Kernel Inference and Optimization

Provably Efficient Exploration in Policy Optimization

Optimistic Active Exploration of Dynamical Systems

Online Policy Optimization for Robust MDP

Approximating Pareto Frontier Through Bayesian-optimization-directed Robust Multi-objective Reinforcement Learning

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

Exploration in Feature Space for Reinforcement Learning

The many faces of optimism - Extended version

Bayesian Exploration Networks

Dynamic Subgoal-based Exploration via Bayesian Optimization

Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling

Exploration Analysis in Finite-Horizon Turn-based Stochastic Games.

Probabilistic Constrained Reinforcement Learning with Formal Interpretability

Near-Optimal BRL using Optimistic Local Transitions