Abstract:Unsupervised reinforcement learning (RL) studies how to leverage environment statistics to learn useful behaviors without the cost of reward engineering. However, a central challenge in unsupervised RL is to extract behaviors that meaningfully affect the world and cover the range of possible outcomes, without getting distracted by inherently unpredictable, uncontrollable, and stochastic elements in the environment. To this end, we propose an unsupervised RL method designed for high-dimensional, stochastic environments based on an adversarial game between two policies (which we call Explore and Control) controlling a single body and competing over the amount of observation entropy the agent experiences. The Explore agent seeks out states that maximally surprise the Control agent, which in turn aims to minimize surprise, and thereby manipulate the environment to return to familiar and predictable states. The competition between these two policies drives them to seek out increasingly surprising parts of the environment while learning to gain mastery over them. We show formally that the resulting algorithm maximizes coverage of the underlying state in block MDPs with stochastic observations, providing theoretical backing to our hypothesis that this procedure avoids uncontrollable and stochastic distractions. Our experiments further demonstrate that Adversarial Surprise leads to the emergence of complex and meaningful skills, and outperforms state-of-the-art unsupervised reinforcement learning methods in terms of both exploration and zero-shot transfer to downstream tasks.

Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning

Explore and Control with Adversarial Surprise

A Mixture of Surprises for Unsupervised Reinforcement Learning

Curiosity-Driven Exploration via Latent Bayesian Surprise

Intrinsic Rewards for Exploration Without Harm From Observational Noise: A Simulation Study Based on the Free Energy Principle

Curiosity-driven Exploration by Self-supervised Prediction

Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization

How to Stay Curious while Avoiding Noisy TVs using Aleatoric Uncertainty Estimation

SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments

Adaptive Teaching in Heterogeneous Agents: Balancing Surprise in Sparse Reward Scenarios

Curiosity & Entropy Driven Unsupervised RL in Multiple Environments

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Random curiosity-driven exploration in deep reinforcement learning

Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

Never Give Up: Learning Directed Exploration Strategies

External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling

Improving Cooperative Multi-Agent Exploration via Surprise Minimization and Social Influence Maximization

Large-Scale Study of Curiosity-Driven Learning

A unified strategy for implementing curiosity and empowerment driven reinforcement learning