Abstract:Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.

Training Agents using Upside-Down Reinforcement Learning

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Upside-Down Reinforcement Learning for More Interpretable Optimal Control

Stabilizing Unsupervised Environment Design with a Learned Adversary

Hierarchical Reinforcement Learning in Complex 3D Environments

Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points

Wasserstein Unsupervised Reinforcement Learning

Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

Learn to Interpret Atari Agents.

Learning Curricula in Open-Ended Worlds

Reinforcement Learning with Unsupervised Auxiliary Tasks

On Training Flexible Robots using Deep Reinforcement Learning

Hierarchical Reinforcement Learning from Demonstration via Reachability-Based Reward Shaping

Reinforcement Learning for Uplift Modeling

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned

Positive-Unlabeled Reward Learning

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Unsupervised Control Through Non-Parametric Discriminative Rewards

PDRL: Multi-Agent based Reinforcement Learning for Predictive Monitoring