Abstract:Unsupervised reinforcement learning (RL) studies how to leverage environment statistics to learn useful behaviors without the cost of reward engineering. However, a central challenge in unsupervised RL is to extract behaviors that meaningfully affect the world and cover the range of possible outcomes, without getting distracted by inherently unpredictable, uncontrollable, and stochastic elements in the environment. To this end, we propose an unsupervised RL method designed for high-dimensional, stochastic environments based on an adversarial game between two policies (which we call Explore and Control) controlling a single body and competing over the amount of observation entropy the agent experiences. The Explore agent seeks out states that maximally surprise the Control agent, which in turn aims to minimize surprise, and thereby manipulate the environment to return to familiar and predictable states. The competition between these two policies drives them to seek out increasingly surprising parts of the environment while learning to gain mastery over them. We show formally that the resulting algorithm maximizes coverage of the underlying state in block MDPs with stochastic observations, providing theoretical backing to our hypothesis that this procedure avoids uncontrollable and stochastic distractions. Our experiments further demonstrate that Adversarial Surprise leads to the emergence of complex and meaningful skills, and outperforms state-of-the-art unsupervised reinforcement learning methods in terms of both exploration and zero-shot transfer to downstream tasks.

Unsupervised Control Through Non-Parametric Discriminative Rewards

Reinforcement Learning with Unsupervised Auxiliary Tasks

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Unsupervised Visuomotor Control through Distributional Planning Networks

Never Give Up: Learning Directed Exploration Strategies

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Explore and Control with Adversarial Surprise

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

Human-level control through deep reinforcement learning

Learning Dense Reward with Temporal Variant Self-Supervision

Stabilizing Unsupervised Environment Design with a Learned Adversary

Curiosity-driven Exploration by Self-supervised Prediction

Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration

TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

Learning Transparent Reward Models via Unsupervised Feature Selection

Dream to Explore: Adaptive Simulations for Autonomous Systems

Multigoal Visual Navigation With Collision Avoidance via Deep Reinforcement Learning

Unsupervised State Representation Learning in Atari

Unsupervised Representation Learning in Partially Observable Atari Games

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control